Browsing resource, all submissions are temporary.
dplyr
The dplyr package is very useful in managing data frames. In this problem, you will explore some useful dplyr functions.
Before attempting the following questions, you should have read Chapter 13 of Peng's textbook to learn the dplyr functions. You probably won't be able to absorb all of the techniques by just reading the chapter. After reading the chapter, study the following examples using dplyr to do the 4 Lon Capa problems in Weeks 2, 3, 5 and 8:
Week 2's Optimization problem
Week 3's Maximum Speed problem
Week 5's Stock Market Price problem
Week 8's Ecological Correlation problem
Compare the codes using dplyr and the ones using base R.
The following questions require you to use the tibble, readr and the dplyr packages. The easiest way to load these packages is to use the command
tibble
readr
library(tidyverse)
You will use the same stat 100 grade data as in the previous exercise. Load the data using the following command.
tib <- read_csv("stat100.csv")
As in the previous exercise, try to figure out the answers based on what you've read first. You may not have seen some of the commands below, so you have to try out all commands to check your answers.
From now on, we will work on the tibble stat100 with the exam columns renamed.
stat100
hw
exams
online
Note: Try adding the two columns using a dplyr command and then submit "show answer" in the following box to check your answer.
Enter "show answer" to see the answer.
arrange()
desc()
Student with the 6th highest score of the course total:
Suppose the Stat 100 instructors are interested to study how students in the three sections performed in the exams. Calculate the weighted exam average according to the formula exam_avg = ( 20*(Exam1+Exam2+Exam3)+ 25*Final )/85 and then compute the mean, population standard deviation, and median of exam_avg for the three sections L1, L2 and ONL. Note that we have the data for the whole population, so we can calculate the population standard deviation. If you use the sd() function, you'll have to include the factor √(n-1)/n to convert the sample sd to population sd.
sd()
Stat 100 instructors want to compare student's performance on exams between the online section (Section = "ONL") and the in-person sections (Section = "L1" or "L2"). They also wants to split the two groups further into Freshmen (Year = "Fr"), Sophomores (Year = "So"), Juniors (Year = "Jr") and Seniors (Year = "Sr"). That is, there are 8 groups total: online and freshman, online and sophomore, online and junior, online and senior, in-person and freshman, in-person and sophomore, in-person and junior, in-person and senior.
Create a new column named Class in the stat100 tibble. Set Class to "online" if Section is "ONL" and set Class to "in-person" if Section is "L1" or "L2". [Hint: Use the ifelse() command. Type ?ifelse or use google to find out how to use it.]
Class
Section
ifelse()
?ifelse
exam_avg
median(exam_avg)