1. LON-CAPA Logo
  2. Help
  3. Log In
 

Browsing resource, all submissions are temporary.


Simulating the Birthday Problem

The birthday problem is a famous probability problem. It concerns the probability of getting at least one birthday match among n randomly chosen people. For n=100, the probability is almost 1, as you have been told in Stat 100 and seen the calculation in Week 2's notes. Do you still not believe it? Do you dare to take a bet?

Back in the days when Professor Ellen Fireman taught the in-person Stat 100 classes, she would bet the class that if she went around the class asking the birthdays of students she would get a birthday match within 100 students. She told the class that she would give everyone A+ if a match did not occur within 100 students. Then she would go around the class asking students' birthdays and indeed a match occurred within 100 students. She betted the class every time she taught the birthday problem and she won every time. The chance of her losing the bet is only 3×10-7, as you have seen the calculation in Week 2's notes.

In this exercise, you will simulate Professor Fireman going around the class and find how many students, on average, she has to go through in order to find the first birthday match. For simplicity, ignore the leap date and assume there are 365 possible birthdays. Use 1 to represent "Jan 1st", 2 to represent "Jan 2nd", ..., and finally 365 to represent "Dec 31st". There must be a match in 366 students. So for a typical experiment, generate 366 birthdays using the command sample.int(365,366,replace=TRUE) and then search for the first birthday match.

Suppose x is an integer vector of length 366, and each element of x is an integer between 1 and 365. The numbers in x represent the birthdays of 366 students. Write a function, named firstMatch(x), that finds the index corresponding to the first occurance of a repeated number in x. This index represents the number of students Professor Fireman has to go through to find the first birthday match.

Hint: two versions of the code that finds the index has already been written in one of Week 4's Lon Capa problems. You just need to turn the code to a function.

Code test: It is very easy to construct an x to test your function. For example, the following code should always return the number 11.

x <- c(sample.int(10),sample.int(10),sample.int(365,346, replace=TRUE))
firstMatch(x)

[1] 11

sample.int(10) is the same as sample.int(10,10), which is just a random permutation of 1, 2, 3, ..., 10. So there is no repeated number in sample.int(10), but there is another sample.int(10) following it. As a result, the first match must occur at the 11th number. So firstMatch(x) should always return 11. You don't even need to use the set.seed() function here. Other tests can be generated in a similar way.

After you finish writing the function, do the birthday match experiment 10,000 times to find the distribution of m, where m is the number of students required to find the first birthday match. The following code can be used to do the 10,000 experiments.

RNGversion("3.5.0") 
set.seed(9228598)
m <- MISSING CODE

a. What could be the missing code that replaces MISSING CODE above?




 Tries 0/1

Copy and paste the code (with the correct missing code filled in) and run it. Be sure to use the seed number 9228598; otherwise, your answers to the following questions will not match the computer's answers.

b. How many times in these 10,000 experiments does m exceeds 100? That is, how many times would Professor Fireman lose the bet if she repeated the experiment 10,000 times?

 Tries 0/3

c. What is the minimum and maximum values of m in these 10,000 experiments?

min(m) = ,     max(m) = .

 Tries 0/5

d. Esimate the expected value of m by taking the average value of m in the 10,000 experiments. This is the number of students Professor Fireman needs to go through on average to find a birthday match.

Sanity check: your answer should be a postive number with no more than 4 decimal places.

Expected value ≈

 Tries 0/3

e. Calculate the sample standard deviation of m. Round your answer to 3 decimal places.

 Tries 0/3

f. In one class, Professor Fireman didn't get a match for more than 50 students and she began to panic. Estimate the probability that m > 50 in one experiment by counting the proportion of m > 50 occurring in the 10,000 experiments.

Sanity check: your answer should be between 0 and 1 with no more than 4 decimal places.

P(m > 50) ≈

 Tries 0/3