Browsing resource, all submissions are temporary.

Airplane Seat Puzzle

This is a classic math puzzle: 100 people line up to board a plane with 100 seats. The first person in line has lost his boarding pass and randomly chooses a seat. After that, each person entering the plane sits in their assigned seat if it is unoccupied, or if it is occupied, chooses an unoccupied seat randomly. What is the probability that the last person sits in his assigned seat?

You can find the solution to the puzzle by searching the internet. In this problem, we consider a generalized problem: what is the probability that the i-th person ends up in a wrong seat (i.e. a seat not assigned to the person), where i is an integer between 2 and 100. We are going to calculate the approximate values of the probabilities by performing a simulation. Then we will compare the result with the analytic solution.

As you have seen many times before, the most important part of setting up a simulation is to write a function that performs a single experiment. Consider the following function.

# Airplne seat problem with n seats
seat <- function(n) {
  # Use 'occupied' to keep track of occupied seats 
  # (occupied[i] is TRUE if the seat asigned to the i-th person is occupied, FALSE if empty)
  occupied <- rep(FALSE,n)
  # Use 'assign' to keep track of people occupying their assigned seats
  # (assign[i] is TRUE if the i-th person sits in the assigned seat, FALSE otherwise)
  assign <- rep(TRUE,n)
  # randomly put the first person to a seat
  occupied[sample.int(n,1)] <- TRUE
  assign[1] <- occupied[1]
  # now determine the seats for the rest of people
  for (i in 2:n) {
    if(occupied[i]) {
      # the assigned seat to the i-th person is occupied, the i-th person randomly chooses an empty seat.
      assign[i] <- MISSING CODE 1
      empty_seats <- MISSING CODE 2
      random_seat <- sample(empty_seats,1)
      occupied[random_seat] <- MISSING CODE 3
    } else {
      occupied[i] <- TRUE
    }
  }
  # return the logical vector 'assign'
  assign
}

What should MISSING CODE 1, MISSING CODE 2 and MISSING CODE 3 be?

MISSING CODE 1 should be

MISSING CODE 2 should be

MISSING CODE 3 should be

Tries 0/5

The function seat() performs one experiment and returns the logical vector assign. To perform a simulation with 10⁵ experiments, we can use the following command.

N <- 100 # 100 seats total
RNGversion("3.5.0") 
set.seed(41467541)
result <- replicate(1e5, seat(N))

Run the code above. You should see that result is a 100×100000 matrix consisting of logical (TRUE/FALSE) values.

Suppose I want to know if the 94th person occupies his/her assigned seat in the 7865th experiment. What command should I type?
result[7865,94]
result[94,7865]

Tries 0/1

Use the matrix result to calculate the approximate probability that the 94th person occupies a wrong seat. (Sanity check: the number should have no more than 5 decimal points.)

Probability ≈

Tries 0/3

Now do the same calculation for other people. That is, calculate the approximate probability that the i-th person occupies a wrong seat, where i = 2, 3, ..., 100. Store the result to a numeric vector psim of length 99.

Make a scatter plot of 1/psim versus i for i from 2 to 100. What do you see?
The points approximately follow a straight line with a positive slope.
The points approximately follow a straight line with a negative slope.
None of the above.

Tries 0/1

Fit a linear model predicting 1/psim from i for i from 2 to 100. Fill in the following table. Give your answers to 4 significant figures.

Coefficients	Estimate	Standard Error
Intercept
Slope

Tries 0/5

The analytic solution is p(i) = 1/(102-i) (the calculation is explained in this pdf file). Hence 1/p(i) = 102-i. Therefore, the intercept is 102 and slope is -1. Is the simulation result consistent with the analytic solution? To test that, we consider the null and alternative hypothesis.

H₀: The simulation result is consistent with the analytic solution. The observed difference is caused by chance variation.

H_A: The simulation result is inconsistent with the analytic solution. The observed difference is too large to be explained by chance variation.

You may be tempted to test the hypothesis using the standard method. First compute the t statistic for the intercept and slope using the formula t = (observed value - expected value)/(standard error). Then calculate the p-value and see if it is smaller than a prescribed significance level. However, as you've learned in Stat 200 the method applies only when certain assumptions are satisfied. One of the assumptions is homoscedasticity.

Make a scatter plot of the residuals vs i of the linear model above. What do you see?
The residual variance increases with i.
The residual variance decreases with i.
The residual variance is approximately independent of i.
As i increases, the residual variance first decreases, reaches a minimum around i = 50, and then increases again.
As i increases, the residual variance first increases, reaches a maximum around i = 50, and then decreases again.

Tries 0/1

Transformation of Variable

It can be shown that under H₀, the expeacted value of psim[i] is E(psim[i]) = p(i) and SD(psim[i]) = σ(i) = √p(i) [1-p(i)] / N_sim , where N_sim=10⁵ is the number of repeated experiments in the simulation and p(i) = 1/(102-i). Introduce a new variable y:

y(i) = [psim[i] - p(i)] / σ(i)

Calculate y(i) for i from 2 to 100 and store the result to a numeric vector of length 99. By construction, under H₀ the expected value of y(i) is 0 and the SD is 1 independent of i. Under H₀, y(i) should be just a random noise.

Make a scatter plot of y(i) verus i. What do you see?
y(i) increases with i.
y(i) decreases with i.
y(i) appears to scatter randomly.
As i increases, y(i) first decreases, reaches a minimum around i = 50, and then increases again.
As i increases, y(i) first increases, reaches a maximum around i = 50, and then decreases again.

Tries 0/1

Fit a linear model predicting y from i. Then make a residual plot. What do you see?
The residual variance increases with i.
The residual variance decreases with i.
The residual variance appears to be independent of i.
As i increases, the residual variance first decreases, reaches a minimum around i = 50, and then increases again.
As i increases, the residual variance first increases, reaches a maximum around i = 50, and then decreases again.

Tries 0/1

What are the p-values of the intercept and slope (round to 2 decimal places). Use them to determine if the result is consistent with H₀. Recall that the expected values of intercept and slope are 0 under H₀.

P-value of the intercept =

P-value of the slope =

Conclusion: reject H₀ at 5% significance level?
Yes
No

Tries 0/5