Exploratory data analysis (scatterplot, correlation)
Fit a regression model
Check the validity of the assumptions/ Residual analysis
Check R2adj
Hypothesis testing: ANOVA
Hypothesis testing: t-test
Interpret point estimates of coefficients
Compute confidence intervals on the regression coefficients and mean response and interpret the results
Prediction of new observations
Exploratory data analysis
Fit a regression model
Check the validity of the assumptions/ Residual analysis
Check R2adj
Hypothesis testing: ANOVA
Hypothesis testing: t-test
Interpret point estimates of coefficients
8. Compute confidence intervals on the regression coefficients and mean response and interpret the results
9. Prediction of new observations
Confidence intervals on the regression coefficients
Confidence interval estimation of the mean response at a given point
To construct confidence intervals for regression coefficients ( βj , j=0,1,...p) we will continue to assume that,
Hence, before constructing the confidence intervals you need to check the validity of the assumptions.
Data set
library(tidyverse)heart.data <- read_csv("heart.data.csv")heart.data
# A tibble: 498 x 4 X1 biking smoking heart.disease <dbl> <dbl> <dbl> <dbl> 1 1 30.8 10.9 11.8 2 2 65.1 2.22 2.85 3 3 1.96 17.6 17.2 4 4 44.8 2.80 6.82 5 5 69.4 16.0 4.06 6 6 54.4 29.3 9.55 7 7 49.1 9.06 7.62 8 8 4.78 12.8 15.9 9 9 65.7 12.0 3.0710 10 35.3 23.3 12.1 # … with 488 more rows
regHeart <- lm(heart.disease ~ biking+ smoking, data=heart.data)regHeart
Call:lm(formula = heart.disease ~ biking + smoking, data = heart.data)Coefficients:(Intercept) biking smoking 14.9847 -0.2001 0.1783
Validity of the assumptions: All satisfied. We discussed in Week 8
confint(regHeart, level=0.95)
2.5 % 97.5 %(Intercept) 14.8272075 15.1421084biking -0.2028166 -0.1974495smoking 0.1713800 0.1852878
confint(regHeart, level=0.95)
2.5 % 97.5 %(Intercept) 14.8272075 15.1421084biking -0.2028166 -0.1974495smoking 0.1713800 0.1852878
confint(regHeart, level=0.90)
5 % 95 %(Intercept) 14.8525973 15.1167186biking -0.2023839 -0.1978822smoking 0.1725014 0.1841665
confint(regHeart, level=0.95)
2.5 % 97.5 %(Intercept) 14.8272075 15.1421084biking -0.2028166 -0.1974495smoking 0.1713800 0.1852878
2.5 % 97.5 %(Intercept) 14.8272075 15.1421084biking -0.2028166 -0.1974495smoking 0.1713800 0.1852878
Intercept: 95% Confidence Interval [14.82, 15.14]
This means that if X1 (biking
) and X2 (smoking
) remain at zero, we are 95% confidence that the mean percentage of people with heart disease is between 14.82% and 15.14%.
β1: 95% Confidence Interval [-0.20, -0.19]
This means that if X2 (smoking
) remains fixed, we are 95% confidence that an one percent increase in biking
is associated with a decrease in the mean percentage of people with heart disease at least 0.19 percent and not more than 0.20 percent.
β2 : 95% Confidence Interval [0.17, 0.19]
This means that if X1 (biking
) remains fixed, we are 95% confidence that an one percent increase in smoking
is associated with an increase in the mean percentage of people with heart disease at least 0.17 percent and not more than 0.18 percent.
Y=β0+β1X1+β2X2+ϵ
where,
Y - percentage of people with heart disease,
X1 - percentage of people in each town who bike to work,
X2 - percentage of people in each town who smoke
regHeart
Call:lm(formula = heart.disease ~ biking + smoking, data = heart.data)Coefficients:(Intercept) biking smoking 14.9847 -0.2001 0.1783
^Y=14.9847−0.2001X1+0.1783X2, where ^Y - Fitted values.
Suppose we have an observation X1=30.8 and X2=10.9 and we would like to find a 95% confidence interval on the percentage of people with heart disease
The fitted value at this point is:
^Y=14.9847−0.2001X1+0.1783X2
^Y=14.9847−0.2001(30.8)+0.1783(10.9)=10.764
A 95% confidence interval on the mean percentage of people with heart disease at this point is:
predict(regHeart, list(biking = 30.8, smoking = 10.9),interval='confidence', level=0.95)
fit lwr upr1 10.7644 10.69625 10.83255
Interpretation:
We can be 95% confident that the mean percentage of people with heart disease of all towns at X1(biking)=30.8 and X2(smoking)=10.9 is between 10.69 and 10.83 percent.
Suppose we want to predict construct 95% prediction interval on the percentage people with heart disease at X1=60 and X2=20.
predict(regHeart, list(biking = 60, smoking = 20),interval='predict', level=0.95)
fit lwr upr1 6.543353 5.255293 7.831413
We can be 95% confident that the percentage of people with heart disease at a town at X1=60 and X2=20 will be between 5.25 and 7.83 percent.
Purpose: Illustrate the difference between confidence intervals for mean and prediction intervals.
A 95% confidence interval on the mean percentage of people with heart disease at the point at X1=30.8 and X2=10.9.
predict(regHeart, list(biking = 30.8, smoking = 10.9),interval='confidence', level=0.95)
fit lwr upr1 10.7644 10.69625 10.83255
Suppose we want to predict construct 95% prediction interval on the percentage people with heart disease at X1=60 and X2=20.
predict(regHeart, list(biking = 60, smoking = 20),interval='predict', level=0.95)
fit lwr upr1 6.543353 5.255293 7.831413
Prediction of the mean response
what would be the average (mean) response with characteristics X1=30.8 and X2=10.9 ?
predict(regHeart, list(biking = 30.8, smoking = 10.9),interval='confidence', level=0.95)
fit lwr upr1 10.7644 10.69625 10.83255
We predict the mean value of Y with characteristics X1=30.8 and X2=10.9.
Prediction of a future value
what is the predicted value of Y with characteristics X1=60 and X2=20?
predict(regHeart, list(biking = 60, smoking = 20),interval='predict', level=0.95)
fit lwr upr1 6.543353 5.255293 7.831413
We predict Y for a specific new case that comes from the population with characteristics X1=60 and X2=20.
Prediction interval for a new response.
Prediction of the mean response
fit lwr upr1 10.7644 10.69625 10.83255
We can be 95% confident that the mean percentage of people with heart disease of all towns at X1(biking)=30.8 and X2(smoking)=10.9 is between 10.69 and 10.83 percent.
Prediction of a future value
fit lwr upr1 6.543353 5.255293 7.831413
We can be 95% confident that the percentage of people with heart disease at a town at X1=60 and X2=20 will be between 5.25 and 7.83 percent.
newheartdata <- data.frame(biking = c(30, 40, 40, 60), smoking = c(20, 30, 12, 10))newheartdata
biking smoking1 30 202 40 303 40 124 60 10
predict(regHeart, newdata=newheartdata , interval="predict")
fit lwr upr1 12.547345 11.260464 13.8342252 12.329353 11.039054 13.6196523 9.119343 7.832795 10.4058914 4.760014 3.471742 6.048286
Y=β0+β1X1+β2X2+...+βpXp+ϵ
Y=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣Y1Y2...Yn⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦
X=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣1,x11,x12,...,x1p1,x21,x22,...,x2p...1,xn1,xn2,...,xnp⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦
^β=(X′X)−1X′Y
Confidence intervals on the regression coefficients
[ ^βj−tα/2,n−p√^σ2Cjj, ^βj+tα/2,n−p√^σ2Cjj ]
Cjj is the jth diagonal element of the (X′X)−1
Unbiased estimator for σ2 is given by
^σ2 = MSE
Prediction of the mean response Confidence interval for:
Mean Response at x01,x02,..,x0p, E[Y|X1=x01,X2=x02...,Xp=x0p]=μY|X1=x01,X2=x02...,Xp=x0p
Fitted value at x01,x02,..,x0p
[x0]′ = [1,x01,x02,...,x0k]
^y0=x′0^β
[ ^y0−tα/2,n−p√^σ2x′0(X′X)−1x0, ^y0+tα/2,n−p√^σ2x′0(X′X)−1x0 ]
Prediction of a future value
[ ^y0−tα/2,n−p√^σ2(1+x′0(X′X)−1x0), ^y0+tα/2,n−p√^σ2(1+x′0(X′X)−1x0) ]
Acknowledgement
Introduction to Linear Regression Analysis, Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining
All rights reserved by
Exploratory data analysis (scatterplot, correlation)
Fit a regression model
Check the validity of the assumptions/ Residual analysis
Check R2adj
Hypothesis testing: ANOVA
Hypothesis testing: t-test
Interpret point estimates of coefficients
Compute confidence intervals on the regression coefficients and mean response and interpret the results
Prediction of new observations
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |