1. LON-CAPA Logo
  2. Help
  3. Log In
 

Browsing resource, all submissions are temporary.


Simulating Coin Flips

In the podcast Stochasticity (around 9:00 into the clip), Statistics Professor Deborah Nolan at the University of California at Berkerly describes an experiment she did in her class. She divided the class into two groups. She asked one group of students to flip a coin 100 times and record the result, and asked the other group of students to pretend flipping a coin 100 times and write down what they thought the outcome would be. She left the room and let the students complete the activity. When she came back, she examined the results written on the blackboard and immediately identified which set was the real one. The students were amazed.

How did Professor Nolan figure out which one was real? She explains that as soon as she saw there were 7 consecutive tails in one of the lists but no more than 4 consecutive heads or tails in the other, she knew the one with 7 consecutive tails was the real one. Strange things do happen by chance! Humans have poor intuition about probability and statistics.

What is the probability that at least 7 consecutive heads or tails occur in 100 coin flips? The analytic calculation is carried out via a recurrent formula. In this problem, you are going to do a computer simulation to find an approximate answer.

In lesson 13 of the R programming in swirl, you learn that coin flips can be simulated using the sample() function. For example, the command
sample(c(0,1), 100, replace=TRUE)
simulates flipping a fair coin 100 times, with '0' representing 'tail' and '1' representing 'head'. Here we do not need to specify the prob parameter since we are interested in simulating a fair coin.

To solve the probability problem, you first need to have a function that counts the maximum number of consecutive heads or tails for any given outcome of coin flips. The function rle() can be used to do this. As described in the help page in ?rle, for a given vector x, rle(x) returns a list with two components:

  1. lengths: an integer vector containing the length of each run.
  2. values: a vector of the same length as lengths with the corresponding values.

For example, suppose I set

x <- c(0,0,1,1,1,0,0,0,0)
y <- rle(x)

y$lengths returns 2, 3 and 4, which are the number of consecutive 0's, 1's and 0's in x. y$values returns 0, 1 and 0.

Suppose I set

x <- c(1,0,1,0,0,0,0,0,0,1)
y <- rle(x)

y$lengths returns 1, 1, 1, 6 and 1. y$values returns 1, 0, 1, 0 and 1.

Another way to think about it is that rle() is the inverse of the rep() function. For example, if we set x <- rep(c("A","B","C","A"), c(3,4,7,5)) and y <- rle(x), y$values returns "A", "B", "C", "A" and y$lengths returns 3, 4, 7, 5. RLE stands for "run-length encoding". It can be used for lossless data compression, as explained in Wikipedia.

Now write a function named 'max_streak'. This function takes one argument flips, which is an integer vector containing 0's and 1's corresponding to the outcome of coin flips. The function max_streak(flips) should return the maximum number of consecutive 0's or 1's in flips.

After writing the function, you need to test it. The simplest tests are to apply it on flips that are not too long for you to find the maximum by hand. Here are some examples:

flips24 <- c(0,1,0,1,1,0,1,1,1,0,1,1,1,1,0,0,0,1,0,1,1,1,0,0)
max_streak(flips24)

[1] 4

flips9 <- c(1,0,1,0,1,0,1,0,1)
max_streak(flips9)

[1] 1

flips16 <- c(1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0)
max_streak(flips16)

[1] 10

If you want to test it on a longer string of numbers, you can do it in the following way.

x <- c(sample(c(0,1),19,replace=TRUE),1,0,rep(1,20),0,sample(c(0,1),19,replace=TRUE),1,rep(0,20),1,sample(c(0,1),19,replace=TRUE))
max_streak(x)

[1] 20

Note that we don't use set.seed() in the above example. Every time you run the commands you get a different x, but the answer is always 20. It is because of the way it is set: the first 21 numbers have no more than 20 0's or 1's because any streak will be broken by 1,0. Then the next 20 numbers are all 1's. The streak is broken by a 0. The combination 0,sample(c(0,1),19,replace=TRUE),1 guarantees that there is no more than 20 0's or 1's in these 21 numbers. The next 20 numbers are 20 0's and the streak is broken by a 1. The last 20 numbers are 1,sample(c(0,1),19,replace=TRUE), which have no more than 20 0's or 1's. So the maximum number of consecutive 0's or 1's in x is always 20.


a. Now you have a function that calculates the maximum number of consecutive heads or tails. Let's try one experiment of 100 coin flips. Run the follow code:

RNGversion("3.5.0") 
set.seed(223388046)
flips <- sample(c(0,1),100,replace=TRUE)
maxStreak <- max_streak(flips)

What is the value of maxStreak?

 Tries 0/3

To calculate the approximate probability of getting at least 7 streaks of 0's or 1's in 100 coin flips, we need to repeat the experiments many times. Let's repeat the 100 coin flips 10,000 times. Consider the following R code:

RNGversion("3.5.0") 
set.seed(223388046)
maxStreaks <- MISSING CODE

The variable maxStreaks is a vector of length 10,000, storing the values of the maximum number of consecutive heads or tails in 10,000 experiments of 100 coin flips.

b. What could be the missing code that replaces MISSING CODE above to complete this program?






 Tries 0/1

Copy and paste the code and run it. Make sure the first element (maxStreaks[1]) is the same number you get in part (a). Take a look at the statistics of the values in maxStreaks, using commands such as summary(maxStreaks), table(maxStreaks). You can also plot a histogram of maxStreaks. When you are done playing with it, answer the following questions.

c. The expected value of an outcome is the value by averging over the outcomes of the same experiment repeated infinite number of times. In your case, you have done 10,000 repeated experiments of coin flips. The expected value is approximately the mean of the values in maxStreaks. Calculate the mean value.

Sanity check: Your answer must be positive and has no more than 4 decimal places.

mean(maxStreaks) =

 Tries 0/3

d. What is the maximum value in maxStreaks?

max(maxStreaks) =

 Tries 0/3

e. Count the number of values in maxStreaks that are greater than or equal to 7. Divide this number by 10,000. This is the estimated probability that at least 7 consecutive heads or tails occur in 100 coin flips.

Sanity check: Your answer should be a number between 0 and 1 and has no more than 4 decimal places.

P(≥7) ≈

 Tries 0/3

f. Count the number of values in maxStreaks that are greater than or equal to 6. Divide this number by 10,000. This is the estimated probability that at least 6 consecutive heads or tails occur in 100 coin flips.

P(≥6) ≈

 Tries 0/3

g. Count the number of values in maxStreaks that are smaller than or equal to 4. Divide this number by 10,000. This is the estimated probability that no more than 4 consecutive heads or tails occur in 100 coin flips.

P(≤4) ≈

 Tries 0/3