Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: zz-x2580
ECON 2300 Introductory Econometrics
Lecture 2: Linear Regression with One Regressor
1 / 29
Overview of the topic:
▶ In this lecture we will learn how to investigate the relationship between
two variables (X and Y )
▶ A simple relationship is a straight line
▶ A line has a slope and an intercept
▶ We will learn how to estimate the slope and intercept and conduct
inference on these quantities
2 / 29
Outline:
▶ The population linear regression model (LRM) for i = 1, . . . , n
Yi = β0 + β1Xi + ui
▶ The ordinary least squares (OLS) estimator and the sample regression
line for i = 1, . . . , n
Yi = β̂0 + β̂1Xi + ûi
Ŷi = β̂0 + β̂1Xi
ûi = Yi − Ŷi
▶ Measures of fit of the sample regression
▶ The least squares assumptions
▶ The sampling distribution of the OLS estimator
3 / 29
Linear Regression
▶ Linear regression lets us estimate the slope of the population regression
line.
▶ The slope of the population regression line is the expected effect on Y of
a unit change in X .
▶ Ultimately our aim is to estimate the causal effect on Y of a unit change
in X – but for now, just think of the problem of fitting a straight line to data
on two variables, Y and X .
4 / 29
Linear Regression
▶ The problem of statistical inference for linear regression is, at a general
level, the same as for estimation of the mean or of the differences
between two means.
▶ Statistical, or econometric, inference about the slope entails:
▶ Estimation:
How should we draw a line through the data to estimate the population
slope? Answer: ordinary least squares (OLS).
What are advantages and disadvantages of OLS?
▶ Hypothesis testing:
How to test if the slope is zero?
▶ Confidence intervals:
How to construct a confidence interval for the slope?
5 / 29
The Linear Regression Model SW Section 4.1
Does the number of students in a class affect how well the students learn?
▶ The population regression line:
E [TestScorei |STRi ] = β0 + β1 STRi
▶ β1 = slope of population regression line
= change in test score for a unit change in student-teacher ratio (STR)
▶ Why are β0 and β1 “population” parameters?
▶ We would like to know the population value of β1.
▶ We don’t know β1, so must estimate it using data.
6 / 29
The Population Linear Regression Model
Consider
Yi = β0 + β1Xi + ui
for i = 1, . . . , n
▶ We have n observations, (Xi ,Yi), i = 1, .., n.
▶ X is the independent variable or regressor or right-hand-side variable
▶ Y is the dependent variable or left-hand-side variable
▶ β0 = intercept
▶ β1 = slope
▶ ui = the regression error
▶ The regression error consists of omitted factors. In general, these
omitted factors are other factors that influence Y , other than the variable
X . The regression error also includes error in the measurement of Y .
7 / 29
The population regression model in a picture
▶ Observations on Y and X (n = 7); the population regression line; and
the regression error (the “error term"):
8 / 29
The Ordinary Least Squares Estimator (SW Section 4.2)
▶ How can we estimate β0 and β1 from data?
▶ Recall that the least squares estimator of µY is Y , which solves
min
m
n∑
i=1
(Yi −m)2
▶ By analogy, we will focus on the least squares (“ordinary least squares”
or “OLS”) estimator of the unknown parameters β0 and β1. The OLS
estimator solves,
min
b0,b1
n∑
i=1
[Yi − (b0 + b1Xi)]2
▶ In fact, we estimate the conditional expectation function E [Y |X ] under
the assumption that E [Y |X ] = β0 + β1X
9 / 29
Mechanics of OLS
▶ The population regression line:
E [TestScore|STR] = β0 + β1 STR
10 / 29
Mechanics of OLS
▶ The OLS estimator minimizes the average squared difference between
the actual values of Yi and the prediction (“predicted value”, Ŷi ) based
on the estimated line.
▶ This minimization problem can be solved using calculus (Appendix 4.2).
▶ The result is the OLS estimators of β0 and β1.
11 / 29
OLS estimator, predicted values, and residuals
▶ The OLS estimators are
β̂1 =
∑n
i=1(Xi − X )(Yi − Y )∑n
i=1(Xi − X )2
β̂0 = Y − β̂1X
▶ The OLS predicted (fitted) values Ŷi and residuals ûi are
Ŷi = β̂0 + β̂1Xi
ûi = Yi − Ŷi
▶ The estimated intercept, β̂0, and slope, β̂1, and residuals ûi are
computed from a sample of n observations (Xi ,Yi) i = 1, . . . , n.
▶ These are estimates of the unknown population parameters β0 and β1.
12 / 29
OLS regression: R output
̂TestScore = 698.93− 2.28× STR
We will discuss the rest of this output later