+ - 0:00:00
Notes for current slide
Notes for next slide

STA 506 2.0 Linear Regression Analysis

Lecture 10: Confidence Intervals in Multiple Regression

Dr Thiyanga S. Talagala

2020-10-31

1 / 23

Recap

  1. Exploratory data analysis (scatterplot, correlation)

  2. Fit a regression model

  3. Check the validity of the assumptions/ Residual analysis

  4. Check Radj2

  5. Hypothesis testing: ANOVA

    • Test the significance of regression
  6. Hypothesis testing: t-test

    • Tests on individual regression coefficients
  7. Interpret point estimates of coefficients

  8. Compute confidence intervals on the regression coefficients and mean response and interpret the results

  9. Prediction of new observations

2 / 23

Recap

  1. Exploratory data analysis

  2. Fit a regression model

  3. Check the validity of the assumptions/ Residual analysis

  4. Check Radj2

  5. Hypothesis testing: ANOVA

    • Test the significance of regression
  6. Hypothesis testing: t-test

    • Tests on individual regression coefficients
  7. Interpret point estimates of coefficients

8. Compute confidence intervals on the regression coefficients and mean response and interpret the results

9. Prediction of new observations

3 / 23

Confidence Intervals in Multiple Regression

  1. Confidence intervals on the regression coefficients

  2. Confidence interval estimation of the mean response at a given point

4 / 23

Confidence Intervals on the Regression Coefficients

To construct confidence intervals for regression coefficients ( βj , j=0,1,...p) we will continue to assume that,

  • errors ϵi are normally and independently distributed with mean zero and variance σ2.

Hence, before constructing the confidence intervals you need to check the validity of the assumptions.

5 / 23

Confidence Intervals on the Regression Coefficients

Data set

library(tidyverse)
heart.data <- read_csv("heart.data.csv")
heart.data
# A tibble: 498 x 4
X1 biking smoking heart.disease
<dbl> <dbl> <dbl> <dbl>
1 1 30.8 10.9 11.8
2 2 65.1 2.22 2.85
3 3 1.96 17.6 17.2
4 4 44.8 2.80 6.82
5 5 69.4 16.0 4.06
6 6 54.4 29.3 9.55
7 7 49.1 9.06 7.62
8 8 4.78 12.8 15.9
9 9 65.7 12.0 3.07
10 10 35.3 23.3 12.1
# … with 488 more rows
6 / 23

Confidence Intervals on the Regression Coefficients (cont.)

regHeart <- lm(heart.disease ~ biking+ smoking, data=heart.data)
regHeart
Call:
lm(formula = heart.disease ~ biking + smoking, data = heart.data)
Coefficients:
(Intercept) biking smoking
14.9847 -0.2001 0.1783

Validity of the assumptions: All satisfied. We discussed in Week 8

Compute 95% confidence intervals for regression coefficients

confint(regHeart, level=0.95)
2.5 % 97.5 %
(Intercept) 14.8272075 15.1421084
biking -0.2028166 -0.1974495
smoking 0.1713800 0.1852878
7 / 23

Compute 95% confidence intervals for regression coefficients

confint(regHeart, level=0.95)
2.5 % 97.5 %
(Intercept) 14.8272075 15.1421084
biking -0.2028166 -0.1974495
smoking 0.1713800 0.1852878

Compute 90% confidence intervals for regression coefficients

confint(regHeart, level=0.90)
5 % 95 %
(Intercept) 14.8525973 15.1167186
biking -0.2023839 -0.1978822
smoking 0.1725014 0.1841665
8 / 23

Interpretation of confidence intervals for regression coefficients

confint(regHeart, level=0.95)
2.5 % 97.5 %
(Intercept) 14.8272075 15.1421084
biking -0.2028166 -0.1974495
smoking 0.1713800 0.1852878
9 / 23

Interpretation of confidence intervals for regression coefficients

2.5 % 97.5 %
(Intercept) 14.8272075 15.1421084
biking -0.2028166 -0.1974495
smoking 0.1713800 0.1852878

Intercept: 95% Confidence Interval [14.82, 15.14]

This means that if X1 (biking) and X2 (smoking) remain at zero, we are 95% confidence that the mean percentage of people with heart disease is between 14.82% and 15.14%.

β1: 95% Confidence Interval [-0.20, -0.19]

This means that if X2 (smoking) remains fixed, we are 95% confidence that an one percent increase in biking is associated with a decrease in the mean percentage of people with heart disease at least 0.19 percent and not more than 0.20 percent.

β2 : 95% Confidence Interval [0.17, 0.19]

This means that if X1 (biking) remains fixed, we are 95% confidence that an one percent increase in smoking is associated with an increase in the mean percentage of people with heart disease at least 0.17 percent and not more than 0.18 percent.

10 / 23

Confidence Interval Estimation of the Mean Response at a Given Point

Y=β0+β1X1+β2X2+ϵ

where,

Y - percentage of people with heart disease,

X1 - percentage of people in each town who bike to work,

X2 - percentage of people in each town who smoke

Fitted regression model

regHeart
Call:
lm(formula = heart.disease ~ biking + smoking, data = heart.data)
Coefficients:
(Intercept) biking smoking
14.9847 -0.2001 0.1783

Y^=14.98470.2001X1+0.1783X2, where Y^ - Fitted values.

11 / 23

Confidence Interval Estimation of the Mean Response at a Given Point

Suppose we have an observation X1=30.8 and X2=10.9 and we would like to find a 95% confidence interval on the percentage of people with heart disease

The fitted value at this point is:

Y^=14.98470.2001X1+0.1783X2

Y^=14.98470.2001(30.8)+0.1783(10.9)=10.764

A 95% confidence interval on the mean percentage of people with heart disease at this point is:

predict(regHeart, list(biking = 30.8, smoking = 10.9),
interval='confidence', level=0.95)
fit lwr upr
1 10.7644 10.69625 10.83255

Interpretation:

We can be 95% confident that the mean percentage of people with heart disease of all towns at X1(biking)=30.8 and X2(smoking)=10.9 is between 10.69 and 10.83 percent.

12 / 23

Prediction of New Observation

Suppose we want to predict construct 95% prediction interval on the percentage people with heart disease at X1=60 and X2=20.

predict(regHeart, list(biking = 60, smoking = 20),
interval='predict', level=0.95)
fit lwr upr
1 6.543353 5.255293 7.831413

We can be 95% confident that the percentage of people with heart disease at a town at X1=60 and X2=20 will be between 5.25 and 7.83 percent.

13 / 23

Confidence Interval Estimation of the

Mean Response vs. Prediction Interval

Purpose: Illustrate the difference between confidence intervals for mean and prediction intervals.

14 / 23

Prediction of the mean response

A 95% confidence interval on the mean percentage of people with heart disease at the point at X1=30.8 and X2=10.9.

predict(regHeart, list(biking = 30.8, smoking = 10.9),
interval='confidence', level=0.95)
fit lwr upr
1 10.7644 10.69625 10.83255

Prediction of a future value

Suppose we want to predict construct 95% prediction interval on the percentage people with heart disease at X1=60 and X2=20.

predict(regHeart, list(biking = 60, smoking = 20),
interval='predict', level=0.95)
fit lwr upr
1 6.543353 5.255293 7.831413
15 / 23

Prediction of the mean response

what would be the average (mean) response with characteristics X1=30.8 and X2=10.9 ?

predict(regHeart, list(biking = 30.8, smoking = 10.9),
interval='confidence', level=0.95)
fit lwr upr
1 10.7644 10.69625 10.83255

We predict the mean value of Y with characteristics X1=30.8 and X2=10.9.

Prediction of a future value

what is the predicted value of Y with characteristics X1=60 and X2=20?

predict(regHeart, list(biking = 60, smoking = 20),
interval='predict', level=0.95)
fit lwr upr
1 6.543353 5.255293 7.831413

We predict Y for a specific new case that comes from the population with characteristics X1=60 and X2=20.

Prediction interval for a new response.

16 / 23

Interpretations

Prediction of the mean response

fit lwr upr
1 10.7644 10.69625 10.83255

We can be 95% confident that the mean percentage of people with heart disease of all towns at X1(biking)=30.8 and X2(smoking)=10.9 is between 10.69 and 10.83 percent.

Prediction of a future value

fit lwr upr
1 6.543353 5.255293 7.831413

We can be 95% confident that the percentage of people with heart disease at a town at X1=60 and X2=20 will be between 5.25 and 7.83 percent.

17 / 23

Prediction of Set of New Observations

newheartdata <- data.frame(biking = c(30, 40, 40, 60),
smoking = c(20, 30, 12, 10))
newheartdata
biking smoking
1 30 20
2 40 30
3 40 12
4 60 10
predict(regHeart, newdata=newheartdata , interval="predict")
fit lwr upr
1 12.547345 11.260464 13.834225
2 12.329353 11.039054 13.619652
3 9.119343 7.832795 10.405891
4 4.760014 3.471742 6.048286
18 / 23

Mathematical Formula: Least-square estimator

Y=β0+β1X1+β2X2+...+βpXp+ϵ

Y=[Y1Y2...Yn]

X=[1,x11,x12,...,x1p1,x21,x22,...,x2p...1,xn1,xn2,...,xnp]

β^=(XX)1XY

19 / 23

Mathematical Formula

Confidence intervals on the regression coefficients

[ βj^tα/2,npσ^2Cjj, βj^+tα/2,npσ^2Cjj ]

Cjj is the jth diagonal element of the (XX)1

Unbiased estimator for σ2 is given by

σ^2 = MSE

20 / 23

Mathematical Formula

Prediction of the mean response Confidence interval for:

Mean Response at x01,x02,..,x0p, E[Y|X1=x01,X2=x02...,Xp=x0p]=μY|X1=x01,X2=x02...,Xp=x0p

Fitted value at x01,x02,..,x0p

[x0] = [1,x01,x02,...,x0k]

y^0=x0β^

[ y0^tα/2,npσ^2x0(XX)1x0, y0^+tα/2,npσ^2x0(XX)1x0 ]

21 / 23

Mathematical Formula (cont.)

Prediction of a future value

[ y0^tα/2,npσ^2(1+x0(XX)1x0), y0^+tα/2,npσ^2(1+x0(XX)1x0) ]

22 / 23

Acknowledgement

Introduction to Linear Regression Analysis, Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining

All rights reserved by

Dr. Thiyanga S. Talagala

23 / 23

Recap

  1. Exploratory data analysis (scatterplot, correlation)

  2. Fit a regression model

  3. Check the validity of the assumptions/ Residual analysis

  4. Check Radj2

  5. Hypothesis testing: ANOVA

    • Test the significance of regression
  6. Hypothesis testing: t-test

    • Tests on individual regression coefficients
  7. Interpret point estimates of coefficients

  8. Compute confidence intervals on the regression coefficients and mean response and interpret the results

  9. Prediction of new observations

2 / 23
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow