1. LON-CAPA Logo
  2. Help
  3. Log In
 

Browsing resource, all submissions are temporary.


Airquality Data

When you open R, R's "datasets" package is automatically loaded. There are a number of data in the datasets package. See this page for a list. In this problem, you will examine the airquality data, which contains the daily air quality measurements in New York from May to September 1973. Type data(airquality) to load the data to your R's work space. You can type View(airquality) to take a look at the data. Note that R is case sensitive, so View has to be capitalized. The meaning of the columns is explained in this documentation. You can also get the information by typing ?airquality in R.

a. Type class(airquality) to find out the class of the object airquality.

airquality is




 Tries [_1]

b. What are the column names of the dataset?




 Tries [_1]

c. What are the number of observations (i.e. rows) and number of columns in this data? [Hint: use the dim() function, or nrow() and ncol() functions.]

Number of observations =

 Tries [_1]

Number of columns =

 Tries [_1]

d. What do you get when you type the command sum(is.na(airquality))?

 Tries [_1]

This means that there are missing values in the data.

 Tries [_1]

e. You can use the cor() function to compute the correlation matrix of the data. However, you will see a number of NAs if you type cor(airquality) because of the presence of missing values. The cor() function has the option use="complete.obs" to remove all observations containing NAs before computing the correlation matrix. Type cor(airquality,use="complete.obs") to get the correlation matrix. You can use View(cor(airquality,use="complete.obs")) to get a better view of the matrix. From this matrix, what is the correlation coefficient between the variables 'Ozone' and 'Temp'? Enter your answer to at least 4 decimal places.

Correlation coefficient between 'Ozone' and 'Temp' =

 Tries [_1]