1. LON-CAPA Logo
  2. Help
  3. Log In
 

Browsing resource, all submissions are temporary.


Anscombe's Quartet

Take a look at the 'anscombe' dataset that comes with R. The data can be loaded with the command data(anscombe). The data frame anscombe has 11 rows and 8 columns. The first 4 columns are labeled x1, x2, x3 and x4. The last 4 columns are labeled y1, y2, y3 and y4.

a. Use the lm() function to fit a linear model predicting y1 from x1. Fill in the following table.

Coefficient Estimated Std. Error t value Pr(>|t|)
Intercept
Slope for x1
Residual standard error = R2 =
 Tries 0/10

b. Fit 3 more linear models: (1) predicting y2 from x2, (2) predicting y3 from x3, (3) predicting y4 from x4. For each model, compare the coefficients and other quantities returned by lm() to those in the table in part (a). What do you see?





 Tries 0/2

c. Now, do what you should have done in the first place: make plots. Plot y1 versus x1 and then add the regression line on the plot. Do the same for the other 3 data sets. What do you see?

As you have learned in Stat 200, 3 main assumptions of a linear model are linearity, independence and homoscedasticity. That is to say the data points (xi, yi) can be described by the relationship yi0 + β1 xi + εi, with εi scatters randomly with zero mean and uniform variance indepedent of x. Which data set(s) best satisfies these assumptions?





 Tries 0/1