STA 506 2.0 Linear Regression AnalysisLecture 2-3: Simple Linear RegressionDr Thiyanga S. Talagala2020-09-051 / 52

Recap: correlation

2 / 52

Recap: correlation (cont.)

value	interpretation
-1	Perfect negative
(-1, -0.75)	Strong negative
(-0.75, -0.5)	Moderate negative
(-0.5, -0.25)	Weak negative
(-0.25, 0.25)	No linear association
(0.25, 0.5)	Weak positive
(0.5, 0.75)	Moderate positive
(0.75, 1)	Strong positive
1	Perfect positive

3 / 52

Recap: Terminologies

Response variable: dependent variable
Explanatory variables: independent variables, predictors, regressor variables, features (in Machine Learning)

Response variable = Model function + Random Error

Parameter
Statistic
Estimator
Estimate

Read my blogpost

4 / 52

In-class5 / 52

In-class6 / 52

Simple Linear Regression

Simple - single regressor

Linear - has a dual role here.

~~It may be taken to describe the fact that the relationship between $Y$ and $X$ is linear.~~ The word linear refers to the fact that the regression parameters enter in a linear fashion.

7 / 52

Meaning of Linear Model

| What about this?

$Y = β_{0} + β_{1} x_{1} + β_{2} x_{2} + ϵ$

8 / 52

Meaning of Linear Model

| What about this?

$Y = β_{0} + β_{1} x_{1} + β_{2} x_{2} + ϵ$

| Linear or nonlinear?

$Y = β_{0} + β_{1} x + β_{2} x^{2} + ϵ$

8 / 52

Meaning of Linear Model

| What about this?

$Y = β_{0} + β_{1} x_{1} + β_{2} x_{2} + ϵ$

| Linear or nonlinear?

$Y = β_{0} + β_{1} x + β_{2} x^{2} + ϵ$ | Linear or nonlinear?

$Y = β_{0} e^{β_{1} x} + ϵ$

8 / 52

Meaning of Linear Model

| What about this?

$Y = β_{0} + β_{1} x_{1} + β_{2} x_{2} + ϵ$

| Linear or nonlinear?

$Y = β_{0} + β_{1} x + β_{2} x^{2} + ϵ$ | Linear or nonlinear?

$Y = β_{0} e^{β_{1} x} + ϵ$

What about this?

$Y = α X_{1}^{β} X_{2}^{γ} X_{3}^{δ} + ϵ$

8 / 52

True relationship between X and Y in the population

$Y = f (X) + ϵ$

If $f$ is approximated by a linear function

$Y = β_{0} + β_{1} X + ϵ$

The error terms are normally distributed with mean $0$ and variance $σ^{2}$ . Then the mean response, $Y$ , at any value of the $X$ is

$E (Y | X = x_{i}) = E (β_{0} + β_{1} x_{i} + ϵ) = β_{0} + β_{1} x_{i}$

For a single unit $(y_{i}, x_{i})$

$y_{i} = β_{0} + β_{1} x_{i} + ϵ_{i} where ϵ_{i} \sim N (0, σ^{2})$

We use sample values $(y_{i}, x_{i})$ where $i = 1, 2, . . . n$ to estimate $β_{0}$ and $β_{1}$ .

The fitted regression model is

$_{i} = {^β}_{0} + {^β}_{1} x_{i}$

9 / 52

Normal distribution

[1] 5.008403

[1] 5.082935

10 / 52

Normal distribution

11 / 52

Normal distribution

From: https://towardsdatascience.com/do-my-data-follow-a-normal-distribution-fb411ae7d832

12 / 52

Normal distribution

From: https://towardsdatascience.com/do-my-data-follow-a-normal-distribution-fb411ae7d832

13 / 52

Normal distribution

From: https://towardsdatascience.com/do-my-data-follow-a-normal-distribution-fb411ae7d832

14 / 52

Buckle up!Let's walk through the steps.15 / 52

In-class

True relationship between X and Y in the population

$Y = f (X) + ϵ$

16 / 52

In-class

True relationship between X and Y in the population

$Y = f (X) + ϵ$

Example: Suppose you want to model daughters' height as a function of mothers' height.

Do you think an exact (deterministic) relationship exists between these two variables?

17 / 52

In-class

True relationship between X and Y in the population

$Y = f (X) + ϵ$

Example: Suppose you want to model daughters' height as a function of mothers' height.

Do you think an exact (deterministic) relationship exists between these two variables?

Why?

18 / 52

In-class

True relationship between X and Y in the population

$Y = f (X) + ϵ$

Example: Suppose you want to model daughters' height as a function of mothers' height.

Do you think an exact (deterministic) relationship exists between these two variables?

Daughters' height may depend on many other variables than Mothers' height.

19 / 52

In-class

True relationship between X and Y in the population

$Y = f (X) + ϵ$

Example: Suppose you want to model daughters' height as a function of mothers' height.

Do you think an exact (deterministic) relationship exists between these two variables?

Daughters' height may depend on many other variables than Mothers' height.
Even if many variables are included in the model, it is unlikely that we can predict the daughter's height exactly. Why?

20 / 52

In-class

True relationship between X and Y in the population

$Y = f (X) + ϵ$

Example: Suppose you want to model daughters' height as a function of mothers' height.

Do you think an exact (deterministic) relationship exists between these two variables?

Daughters' height may depend on many other variables than Mothers' height.
Even if many variables are included in the model, it is unlikely that we can predict the daughter's height exactly. Why?

There will almost certainly be some variations in the model predictions that cannot be modelled, or explained.

These unexplained variances are assumed to be caused by the unexplainable random phenomena, so they can be referred to as random error.

21 / 52

In-class22 / 52

In-class: Population Regression Line

True relationship between X and Y in the population

$Y = f (X) + ϵ$

If $f$ is approximated by a linear function

$Y = β_{0} + β_{1} X + ϵ$

The error terms are normally distributed with mean $0$ and variance $σ^{2}$ . Then the mean response, $Y$ , at any value of the $X$ is

$E (Y | X = x_{i}) = E (β_{0} + β_{1} x_{i} + ϵ) = β_{0} + β_{1} x_{i}$

23 / 52

24 / 52

source: http://digfir-published.macmillanusa.com/psbe4e/psbe4e_ch10_2.html

25 / 52

source: https://tex.stackexchange.com/questions/347744/assumptions-for-simple-linear-regression

26 / 52

In-class: Population Regression Line

$E (Y | X = x_{i}) = E (β_{0} + β_{1} x_{i} + ϵ) = β_{0} + β_{1} x_{i}$

For a single unit $(y_{i}, x_{i})$

$y_{i} = β_{0} + β_{1} x_{i} + ϵ_{i} where ϵ_{i} \sim N (0, σ^{2})$

27 / 52

Take a sample:

The fitted regression line is

$_{i} = {^β}_{0} + {^β}_{1} x_{i}$

28 / 52

Our example

Dashboard: https://statisticsmart.shinyapps.io/SimpleLinearRegression/

29 / 52

Our example (0.52, 30.7)

Dashboard: https://statisticsmart.shinyapps.io/SimpleLinearRegression/

30 / 52

Our example (0.582, 28.5)

Dashboard: https://statisticsmart.shinyapps.io/SimpleLinearRegression/

31 / 52

Our example (0.5, 32.5)

Dashboard: https://statisticsmart.shinyapps.io/SimpleLinearRegression/

Which is the best?

32 / 52

Which is the best?

Sum of squares of Residuals

$S S R = e_{1}^{2} + e_{2}^{2} + . . . + e_{n}^{2}$

33 / 52

Evaluating your answers: Fitted values

Dheight = 30.7 + 0.52Mheight

df <- alr3::heights
df$fitted <- 30.7 + (0.52*df$M)
head(df,10)

   Mheight Dheight fitted
1     59.7    55.1 61.744
2     58.2    56.5 60.964
3     60.6    56.0 62.212
4     60.7    56.8 62.264
5     61.8    56.0 62.836
6     55.5    57.9 59.560
7     55.4    57.1 59.508
8     56.8    57.6 60.236
9     57.5    57.2 60.600
10    57.3    57.1 60.496

First fitted value: 30.7 + (0.52 * 59.7) = 61.744

34 / 52

Evaluating your answers

Sum of squares of Residuals

$S S R = e_{1}^{2} + e_{2}^{2} + . . . + e_{n}^{2}$

Dheight = 30.7 + 0.52Mheight

   Mheight Dheight fitted resid_squared
1     59.7    55.1 61.744     44.142736
2     58.2    56.5 60.964     19.927296
3     60.6    56.0 62.212     38.588944
4     60.7    56.8 62.264     29.855296
5     61.8    56.0 62.836     46.730896
6     55.5    57.9 59.560      2.755600
7     55.4    57.1 59.508      5.798464
8     56.8    57.6 60.236      6.948496
9     57.5    57.2 60.600     11.560000
10    57.3    57.1 60.496     11.532816

[1] 7511.118

SSR: 7511.118

35 / 52

Evaluating your answers

Dashboard: https://statisticsmart.shinyapps.io/SimpleLinearRegression/

Green: 7511.118 (0.52, 30.7)
Orange: 8717.41 (0.582, 28.5)
Purple: 7066.075 (0.5, 32.5)

36 / 52

How to estimate $β_{0}$ and $β_{1}$ ?

Sum of squares of Residuals

$S S R = e_{1}^{2} + e_{2}^{2} + . . . + e_{n}^{2}$

Observed value

$y_{i}$

Fitted value

$_{i}$

$_{i} = {^β}_{0} + {^β}_{1} x_{i}$

Residual

$e_{i} = y_{i} -_{i}$

The least-squares regression approach chooses coefficients ${^β}_{0}$ and ${^β}_{1}$ to minimize $S S R$ .

37 / 52

Least-squares Estimation of the Parameters

$y_{i} = β_{0} + β_{1} x_{i} + ϵ_{i}, i =1, 2, 3, ...n .$

The least squares criterion is

$S (β_{0}, β_{1}) = n \sum i = 1 (y_{i} - β_{0} - β_{1} x_{i})^{2} .$

38 / 52

Least-squares Estimation of the Parameters (cont.)

The least squares criterion is

$S (β_{0}, β_{1}) = n \sum i = 1 (y_{i} - β_{0} - β_{1} x_{i})^{2} .$

The least-squares estimators of $β_{0}$ and $β_{1}$ , say $_{0}$ and $_{1},$ must satisfy

$\frac{\partial S}{\partial β_{0}} |_{_{0},_{1}} = - 2 n \sum i = 1 (y_{i} -_{0} -_{1} x_{i}) = 0$

and

$\frac{\partial S}{\partial β_{1}} |_{_{0},_{1}} = - 2 n \sum i = 1 (y_{i} -_{0} -_{1} x_{i}) x_{i} = 0.$

39 / 52

Least-squares Estimation of the Parameters (cont.)

Simplifying the two equations yields

$n_{0} +_{1} n \sum i = 1 x_{i} = n \sum i = 1 y_{i},$

These are called least-squares normal equations.

40 / 52

Least-squares Estimation of the Parameters (cont.)

The solution to the normal equation is

and

The fitted simple linear regression model is then

41 / 52

Least-squares fit

Try this with R

library(alr3) # to load the dataset
model1 <- lm(Dheight ~ Mheight, data=heights)
model1


Call:
lm(formula = Dheight ~ Mheight, data = heights)
Coefficients:
(Intercept)      Mheight  
    29.9174       0.5417

42 / 52

Least-squares fit and your guesses

fit  <- 0.5417 * df$Mheight + 29.9174
sum((df$Dheight - fit)^2)

[1] 7051.97

43 / 52

Least square fit and your guesses

Green: 7511.118 (0.52, 30.7)
Orange: 8717.41 (0.582, 28.5)
Purple: 7066.075 (0.5, 32.5)
Blue: 7051.97 (0.541, 29.9174)

44 / 52

Try this with R

library(alr3) # to load the dataset
model1 <- lm(Dheight ~ Mheight, data=heights)
model1

summary(model1)


Call:
lm(formula = Dheight ~ Mheight, data = heights)
Residuals:
   Min     1Q Median     3Q    Max 
-7.397 -1.529  0.036  1.492  9.053 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 29.91744    1.62247   18.44   <2e-16 ***
Mheight      0.54175    0.02596   20.87   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.266 on 1373 degrees of freedom
Multiple R-squared:  0.2408,    Adjusted R-squared:  0.2402 
F-statistic: 435.5 on 1 and 1373 DF,  p-value: < 2.2e-16

45 / 52

Visualise the model: Try with R

ggplot(data=heights, aes(x=Mheight, y=Dheight)) + 
  geom_point(alpha=0.5) +
  geom_smooth(method="lm", se=FALSE,
               col="blue", lwd=2) +
  theme(aspect.ratio = 1)

46 / 52

Least squares regression line

summary(alr3::heights)

    Mheight         Dheight     
 Min.   :55.40   Min.   :55.10  
 1st Qu.:60.80   1st Qu.:62.00  
 Median :62.40   Median :63.60  
 Mean   :62.45   Mean   :63.75  
 3rd Qu.:63.90   3rd Qu.:65.60  
 Max.   :70.80   Max.   :73.10

The LSRL passes through the point ( , ), that is (sample mean of , sample mean of )

47 / 52

Least squares regression line

The least squares regression line doesn't match the population regression line perfectly, but it is a pretty good estimate. And, of course, we'd get a different least squares regression line if we took another (different) sample.

48 / 52

49 / 52

Extrapolation: beyond the scope of the model.

50 / 52

Next Lecture

More work - Simple Linear Regression, Residual Analysis, Predictions

51 / 52

Dr. Thiyanga S. Talagala

52 / 52

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

STA 506 2.0 Linear Regression Analysis

Lecture 2-3: Simple Linear Regression

Dr Thiyanga S. Talagala

2020-09-05

Recap: correlation

Recap: correlation (cont.)

Recap: Terminologies

In-class

In-class

Simple Linear Regression

Meaning of Linear Model

Meaning of Linear Model

Meaning of Linear Model

Meaning of Linear Model

Normal distribution

Normal distribution

Normal distribution

Normal distribution

Normal distribution

Buckle up!

Let's walk through the steps.

In-class

In-class

In-class

In-class

In-class

In-class

In-class

In-class: Population Regression Line

In-class: Population Regression Line

Take a sample:

Our example

Our example (0.52, 30.7)

Our example (0.582, 28.5)

Our example (0.5, 32.5)

Which is the best?

Which is the best?

Evaluating your answers: Fitted values

Evaluating your answers

Evaluating your answers

How to estimate β0β0 and β1β1?

Least-squares Estimation of the Parameters

Least-squares Estimation of the Parameters (cont.)

Least-squares Estimation of the Parameters (cont.)

Least-squares Estimation of the Parameters (cont.)

Least-squares fit

Least-squares fit and your guesses

Least square fit and your guesses

Try this with R

Visualise the model: Try with R

Least squares regression line

Least squares regression line

Extrapolation: beyond the scope of the model.

Next Lecture

Recap: correlation

Help

How to estimate $β_{0}$ and $β_{1}$ ?