1. LON-CAPA Logo
  2. Help
  3. Log In
 

Browsing resource, all submissions are temporary.


Tibbles

Before attempting the questions below, you should have read this brief introduction to tibbles and this tibble vignette. Also make sure you have installed the tidyverse packages.

The following questions require you to use the 'tibble' and 'readr' packages. You should load these two packages to your R using the commands

library(tibble)
library(readr)

The command library(tidyverse) also works, but it loads several other packages you don't need as well.

You are going to use the grade data of three sections of Stat 100 (L1, L2 and ONL) from a past semester to study the difference between a data frame and a tibble. The data have 62 columns and 1510 observations. The student's information has been removed to protect provacy. The netids and names are replaced by fake data. Students' names were randomly chosen from this baby-names.csv file. Netid's are assigned according to this rule: first two character of the name + random integer betweeen 1 and 1510.

Some of you may have known that the three sections of Stat 100 (L1, L2 and ONL) were synchronized. The L1 and L2 sections were in-person classes taught by the same instructor and students had to attend the lectures at the Lincoln Hall Theater. The ONL section was an online section and students in that section did not have to go to class, but they watched videos recorded from the L1 lectures. Each lecture video was usually available for the online students within two hours after the L1 class ended. Students in all three sections were given the exact same lectures, homework assignments, bonus assignments, and exams. They also had the same TAs and office hours. That is why it makes sense to combine the grade data of all three sections in a single data file.

Download the Stat 100 grade data here and save it to your R's work space. Copy and paste the following commands to load the data to a data frame and a tibble.

df <- read.csv("stat100.csv")
tib <- read_csv("stat100.csv")

Type class(df) to confirm that df is a data frame; type class(tib) to confirm that tib has types tbl_df and tbl in addition to being a data frame. You are going to explore the differences between a data frame and a tibble in the following questions. Try to figure out the answers based on what you've read first. Then (the most important step) check your answers by trying out all the commands (before doing any submission). At the end of the exercise, make sure you understand why some commands work and why others don't work.

  1. Suppose there is a mistake in the name of the student in the 410th observation (netid = "ja50"). The name should be "Jack" instead of "Jacky". Which of the following commands can be used to correct the name in the data frame df? (Select all that apply)
    df$Name[df$netid=="ja50"] <- "Jack"
    df$Name[410] <- "Jack"
    df[df$netid=="ja50",2] <- "Jack"
    df[410,2] <- "Jack"
    None of the above
  2.  Tries 0/3

  3. What is the names of column 11 in df and tib? Enter the names exactly as shown (case sensitive) but without quotes (e.g. Exam.3 NOT "Exam.3").
  4. Name of column 11 in df:

     Tries 0/5

    Name of column 11 in tib:

     Tries 0/5

  5. Suppose I want to calculate the average of the final exam score (in column 31 named 'Final'). Which of the following commands can be used on the data frame df? (Select all that apply)
    mean( df[,31] )
    mean( df[,"Final"] )
    mean( df[[Final]] )
    mean( df[["Final"]] )
    mean( df$Fin )
    mean( df$Final )
    mean( df[[31]] )
    mean( df$`Final` )
    mean( df[["Fin"]] )
  6.  Tries 0/3

  7. Which of the following commands can be used to calculate the mean on column 31 of the tibble tib? (Select all that apply)
    mean( tib[["Final"]] )
    mean( tib[["Fin"]] )
    mean( tib[,31] )
    mean( tib$`Final` )
    mean( tib$Final )
    mean( tib[[Final]] )
    mean( tib[[31]] )
    mean( tib[,"Final"] )
    mean( tib$Fin )
  8.  Tries 0/3