Browsing resource, all submissions are temporary.

`dplyr` Package

The dplyr package is very useful in managing data frames. In this problem, you will explore some useful dplyr functions.

Before attempting the following questions, you should have read Chapter 13 of Peng's textbook to learn the dplyr functions. You probably won't be able to absorb all of the techniques by just reading the chapter. After reading the chapter, study the following examples using dplyr to do the 4 Lon Capa problems in Weeks 2, 3, 5 and 8:

Compare the codes using dplyr and the ones using base R.

The following questions require you to use the tibble, readr and the dplyr packages. The easiest way to load these packages is to use the command

library(tidyverse)

You will use the same stat 100 grade data as in the previous exercise. Load the data using the following command.

tib <- read_csv("stat100.csv")

As in the previous exercise, try to figure out the answers based on what you've read first. You may not have seen some of the commands below, so you have to try out all commands to check your answers.

The three exam scores are in columns "Exam 1", "Exam 2" and "Exam 3". I want to rename them to "Exam1", "Exam2" and "Exam3". Which of the following commands can be used? (Select all that apply)
stat100 <- rename(tib, Exam1=`Exam 1`, Exam2=`Exam 2`, Exam3=`Exam 3`)
stat100 <- rename(tib, `Exam 1`=Exam1, `Exam 2`=Exam2, `Exam 3`=Exam3)
stat100 <- rename(tib, Exam1=Exam 1, Exam2=Exam 2, Exam3=Exam 3)
stat100 <- rename(tib, Exam 1=Exam1, Exam 2=Exam2, Exam 3=Exam3)
stat100 <- tib %>% rename(Exam1=`Exam 1`, Exam2=`Exam 2`, Exam3=`Exam 3`)
stat100 <- tib %>% rename(`Exam 1`=Exam1, `Exam 2`=Exam2, `Exam 3`=Exam3)
stat100 <- tib %>% rename(Exam1=Exam 1, Exam2=Exam 2, Exam3=Exam 3)
stat100 <- tib %>% rename(Exam 1=Exam1, Exam 2=Exam2, Exam 3=Exam3)
None of the above

Tries 0/3

From now on, we will work on the tibble stat100 with the exam columns renamed.

Suppose I want to put all the HW columns to a new tibble hw. That is, hw should contain the 22 columns HW Assignment 1 [Total Pts: 21], HW Assignmnet 2 [Total Pts: 22], ..., HW Assignment 22 [Total Pts: 18], but NOT the HW_avg column. Which of the following commands can be used? (Select all that apply)
hw <- select(stat100, starts_with("HW A"))
hw <- select(stat100, contains("Assignment"))
hw <- select(stat100, `HW Assignment 1 [Total Pts: 21]`:`HW Assignment 22 [Total Pts: 18]`)
hw <- select(stat100, starts_with("HW Assignment"))
hw <- select(stat100, contains("HW"))
hw <- select(stat100, starts_with("HW"))

Tries 0/3

Suppose I want to save the four columns "Exam1", "Exam2", "Exam3" and "Final" to a tibble exams. Which of the following commands can be used? (Select all that apply)
exams <- select(stat100, Exam1:Final)
exams <- select(stat100, starts_with("Exam"),Final)
exams <- select(stat100, Exam1,Exam2,Exam3,Final)
exams <- select(stat100, paste0("Exam",1:3),Final)
exams <- select(stat100, num_range("Exam",1:3),Final)

Tries 0/3

Suppose I want to select the first three Bonus Survey columns. Which of the following commands can be used? (Select all that apply)
select(stat100, `Bonus Survey 1 [Total Pts: 4]`:`Bonus Survey 3 [Total Pts: 4]`)
select(stat100, contains("Bonus Survey"))
select(stat100, num_range("Bonus Survey ", 1:3))
select(stat100, ends_with("[Total Pts: 4]"))
select(stat100, starts_with("Bonus Survey 1"):starts_with("Bonus Survey 3"))

Tries 0/3

Suppose I want to gather all the online student data (Section=="ONL") and put them in a tibble online. I don't want the Section column (since all of them will be "ONL") nor the iClicker column (since online students don't have iClicker points) in the new tibble. Which of the following commands can be used? (Select all that apply)
online <- select(stat100, Section=="ONL") %>% filter(-Section, -contains("iClicker"))
online <- filter(stat100, Section=="ONL") %>% select(-Section, -contains("iClicker"))
online <- select(stat100, Section=="ONL", -Section, -contains("iClicker"))
online <- filter(stat100, Section=="ONL", -Section, -contains("iClicker"))
online <- filter(stat100, Section=="ONL") %>% filter(-Section, -contains("iClicker"))
online <- select(stat100, Section=="ONL") %>% select(-Section, -contains("iClicker"))

Tries 0/3

Create two new columns in the stat100 tibble corresponding to the weighted total of the required work and the course total. The weighted total of the required work is computed by: 20% for each of the three exams (columns "Exam1", "Exam2" and "Exam3"), 15% HW (use column "HW_avg") and 25% Final exam (in column "Final"). That is,
weighted total of the required work = 0.2*(Exam1 + Exam2 + Exam3) + 0.15*HW_avg + 0.25*Final
The course total is computed by including the total bonus points (in column "bonus_tot") using the formula
course total = 100*(weighted total of the required work + bonus_tot*0.25) / (100 + bonus_tot*0.25)

Note: Try adding the two columns using a dplyr command and then submit "show answer" in the following box to check your answer.

Enter "show answer" to see the answer.

Tries 0/5

(5 points) Use arrange() together with desc() to look at the top 10 students with the highest scores of the course total. Fill in the information in the following table for the student with the 6th highest score of the course total. For the first three columns, enter the exact characters as stored in the tibble (case sensitive), but don't include quotes (e.g. ONL not "ONL"). For the last two columns, you can round numbers to two decimal places.

Student with the 6th highest score of the course total:

netid	Name	Section	Year	weighted tot. (required work)	course total

Tries 0/5

Suppose the Stat 100 instructors are interested to study how students in the three sections performed in the exams. Calculate the weighted exam average according to the formula
exam_avg = ( 20*(Exam1+Exam2+Exam3)+ 25*Final )/85
and then compute the mean, population standard deviation, and median of exam_avg for the three sections L1, L2 and ONL. Note that we have the data for the whole population, so we can calculate the population standard deviation. If you use the sd() function, you'll have to include the factor √(n-1)/n to convert the sample sd to population sd.

(10 points) Fill in the following table for the three sections. Round the mean, population sd and median to two decimal places.

Section	Number of students	mean	pop. sd	median
L1
L2
ONL

Tries 0/5

Stat 100 instructors want to compare student's performance on exams between the online section (Section = "ONL") and the in-person sections (Section = "L1" or "L2"). They also wants to split the two groups further into Freshmen (Year = "Fr"), Sophomores (Year = "So"), Juniors (Year = "Jr") and Seniors (Year = "Sr"). That is, there are 8 groups total: online and freshman, online and sophomore, online and junior, online and senior, in-person and freshman, in-person and sophomore, in-person and junior, in-person and senior.

Create a new column named Class in the stat100 tibble. Set Class to "online" if Section is "ONL" and set Class to "in-person" if Section is "L1" or "L2". [Hint: Use the ifelse() command. Type ?ifelse or use google to find out how to use it.]

(10 points) Calculate the number of students in each of the 8 groups. Also calculate the mean, population sd, and median of exam_avg for each of the 8 groups. Fill in the following table with information for the two groups with the highest and lowest median in exam_avg. Enter either "in-person" or "online" (without quotes) in the Class column, "Fr", "So", "Jr", or "Sr" (without quotes) in the Year column. Round numbers to two decimal places in the last three columns.

Group	Class	Year	Num. of students	mean	pop. sd	median
Lowest `median(exam_avg)`
Highest `median(exam_avg)`

Tries 0/5