Browsing resource, all submissions are temporary.
In this problem, you are going to practice using R's lm() function to fit a simple linear model and use it for prediction. The data are from Stat 100's Survey data.
lm()
Download the Stat 100 Survey 1 data from Spring 2015 to your R's work space and then load it to R using the command
survey <- read.csv("stat100_2015spring_survey01.csv")
The column description of the data can be found here. Take some time to browser the data in the data frame survey, using commands such as sum(is.na(survey)) to check for missing values, names(survey) to browser the column names, View(survey) to look at the data and so on. You can also make some plots.
survey
a. Fit a linear model predicting the student's shoe size ('shoeSize' variable in the data frame) from height in inches ('height' variable in the data frame). What are the regression coefficients, residual standard deviation and R2? (Give your answers to at least 3 significant figures)
Intercept =
Slope for 'height' =
Residual standard error (RSE) =
R2 =
b. Use the confint() function to construct the 95% confidence intervals for the intercept and slope. Enter answers to 3 significant figures.
confint()
95% CI for intercept: from to
95% CI for slope: from to
c. Use the confint() function to construct the 99% confidence interval for the slope. Enter your answers to 3 significant figures.
99% CI for slope: from to
d. Calculate SST, SSM and SSE (to 4 significant figures).
SST =
SSM =
SSE =
e. Which of the following are true? (Select all that apply) SST < SSM + SSE SST = SSM + SSE The correlation coefficient between the residuals and 'shoeSize' is exactly 0 SST > SSM + SSE The correlation coefficient between the residuals and 'height' is exactly 0 The mean of the residuals is exactly 0
Now you are going to use the linear model to make predictions on new data. Download the Survey 1 Spring 2016 data here and then load it to R using the command
spring16 <- read.csv("stat100_2016spring_survey01b.csv")
The description of column variables is on this webpage. Take some time to browser the data in the data frame spring16, especially note that the 'height' and 'shoeSize' columns are there and 'height' is spelled the same way as in the survey data frame. This is important since the predict() function you are going to use below will look for the 'height' column when making predictions using the 'lm' object you created from fitting data from the survey data frame.
spring16
predict()
f. Use the linear model in part (a) to predict the shoe sizes of students in the spring16 data set. Use the predicted values to calculate SST, SSM and SSE for this new data set. Enter your answers to 4 significant figures.
Note: You must use the definition of SST, SSM and SSE below to do the calculations. There are no shortcut formulae in this case since the linear model in part (a) doesn't "know" the new data.
g. The prediction error can be characterized by the root mean square error (RMSE), defined as
Here N is the number of observation in the new data set spring16. Calculate RMSE and enter the answer to 4 significant figures.
RMSE =
h. Which of the following are true for this data set? (Select all that apply) The mean of the residuals is 0 SST = SSM + SSE The correlation coefficient between the residuals and 'height' is exactly 0 SST < SSM + SSE The correlation coefficient between the residuals and 'shoeSize' is exactly 0 SST > SSM + SSE RMSE < residual standard error calculated in part (a)