1. LON-CAPA Logo
  2. Help
  3. Log In
 

Browsing resource, all submissions are temporary.


Short Tutorial on Subsetting a Data Frame

Previously, you have examined the airquality data in R's "datasets" package. The data contain the daily air quality measurements in New York from May to September 1973. The meaning of the columns is explained in this documentation. You are going to analyze it further in this question, using the subsetting techniques you've read in the textbook and lesson 6 of the R programming in swirl.

Recall that R's "datasets" package is automatically loaded when you open R. Type data(airquality) to load the data to your R's work space. You can type View(airquality) to take a quick look at the data. Using the names() function, you previously found that the column names of the data frame airquality are Ozone, Solar.R, Wind, Temp, Month and Day. There are three ways to see the content in, say, the Wind column using R's subset functions (try them out):

Suppose you want to look at data between rows 10 and 20, you can use the command

airquality[10:20,]

The command prints all columns in rows 10-20. Suppose you only want to see columns 3 and 5 in those rows, type

airquality[10:20,c(3,5)]

Suppose you want to see all columns except columns 3 and 5 in those rows, type

airquality[10:20,c(-3,-5)]

To see all the observations with temperature higher than 90°F, type airquality[airquality$Temp>90,]. To count the number of observations with Temp > 90, type sum(airquality$Temp>90). To see all observations with temperature higher than 90°F and wind speed higher than 10mph, type airquality[airquality$Temp>90 & airquality$Wind>10,]. Finally, to subset the data for rows where the "Month" is 9 (September), type airquality[airquality$Month==9,]

You now know enough to answer the following questions. Give your answers to at least 4 significant figures.


a. What is the value of Temp in row 148?

°F
 Tries [_1]

b. How many missing values (NAs) are in the Ozone column of this data frame?

 Tries [_1]

c. What is the mean of the Ozone column in this dataset? Exclude missing values (coded as NA) from this calculation. (Hint: type ?mean to find out how to set a parameter to strip NAs before computing the mean.) ppb

 Tries [_1]

d. Extract the subset of rows of the data frame where the Wind values are above 9 mph and Temp values are above 82°F. What is the mean of Solar.R in this subset (with NAs removed)? (Hint: You need to construct a logical vector in R to match the question's requirements. Then use that logical vector to subset the data frame.)

Mean of Solar.R in the subset (with NAs removed) = Langleys

 Tries [_1]

e. What is the mean of "Temp" when "Month" is equal to 6 (i.e. average temperature in June)? °F

 Tries [_1]

f. What is the maximum Ozone value in June (i.e. Month = 6)? ppb

 Tries [_1]