1. LON-CAPA Logo
  2. Help
  3. Log In
 

Browsing resource, all submissions are temporary.


Airplane Seat Puzzle

This is a classic math puzzle: 100 people line up to board a plane with 100 seats. The first person in line has lost his boarding pass and randomly chooses a seat. After that, each person entering the plane sits in their assigned seat if it is unoccupied, or if it is occupied, chooses an unoccupied seat randomly. What is the probability that the last person sits in his assigned seat?

You can find the solution to the puzzle by searching the internet. In this problem, we consider a generalized problem: what is the probability that the i-th person ends up in a wrong seat (i.e. a seat not assigned to the person), where i is an integer between 2 and 100. We are going to calculate the approximate values of the probabilities by performing a simulation. Then we will compare the result with the analytic solution.

As you have seen many times before, the most important part of setting up a simulation is to write a function that performs a single experiment. Consider the following function.

# Airplne seat problem with n seats
seat <- function(n) {
  # Use 'occupied' to keep track of occupied seats 
  # (occupied[i] is TRUE if the seat asigned to the i-th person is occupied, FALSE if empty)
  occupied <- rep(FALSE,n)
  # Use 'assign' to keep track of people occupying their assigned seats
  # (assign[i] is TRUE if the i-th person sits in the assigned seat, FALSE otherwise)
  assign <- rep(TRUE,n)
  # randomly put the first person to a seat
  occupied[sample.int(n,1)] <- TRUE
  assign[1] <- occupied[1]
  # now determine the seats for the rest of people
  for (i in 2:n) {
    if(occupied[i]) {
      # the assigned seat to the i-th person is occupied, the i-th person randomly chooses an empty seat.
      assign[i] <- MISSING CODE 1
      empty_seats <- MISSING CODE 2
      random_seat <- sample(empty_seats,1)
      occupied[random_seat] <- MISSING CODE 3
    } else {
      occupied[i] <- TRUE
    }
  }
  # return the logical vector 'assign'
  assign
}
  1. What should MISSING CODE 1, MISSING CODE 2 and MISSING CODE 3 be?

    MISSING CODE 1 should be

    MISSING CODE 2 should be

    MISSING CODE 3 should be

  2.  Tries 0/5

    The function seat() performs one experiment and returns the logical vector assign. To perform a simulation with 105 experiments, we can use the following command.

    N <- 100 # 100 seats total
    RNGversion("3.5.0") 
    set.seed(41467541)
    result <- replicate(1e5, seat(N))

    Run the code above. You should see that result is a 100×100000 matrix consisting of logical (TRUE/FALSE) values.

  3. Suppose I want to know if the 54th person occupies his/her assigned seat in the 7865th experiment. What command should I type?


  4.  Tries 0/1

  5. Use the matrix result to calculate the approximate probability that the 54th person occupies a wrong seat. (Sanity check: the number should have no more than 5 decimal points.)
  6. Probability ≈

     Tries 0/3

    Now do the same calculation for other people. That is, calculate the approximate probability that the i-th person occupies a wrong seat, where i = 2, 3, ..., 100. Store the result to a numeric vector psim of length 99.

  7. Make a scatter plot of 1/psim versus i for i from 2 to 100. What do you see?



  8.  Tries 0/1

  9. Fit a linear model predicting 1/psim from i for i from 2 to 100. Fill in the following table. Give your answers to 4 significant figures.
  10. Coefficients Estimate Standard Error
    Intercept
    Slope
     Tries 0/5

    The analytic solution is p(i) = 1/(102-i) (the calculation is explained in this pdf file). Hence 1/p(i) = 102-i. Therefore, the intercept is 102 and slope is -1. Is the simulation result consistent with the analytic solution? To test that, we consider the null and alternative hypothesis.

    H0: The simulation result is consistent with the analytic solution. The observed difference is caused by chance variation.

    HA: The simulation result is inconsistent with the analytic solution. The observed difference is too large to be explained by chance variation.

    You may be tempted to test the hypothesis using the standard method. First compute the t statistic for the intercept and slope using the formula t = (observed value - expected value)/(standard error). Then calculate the p-value and see if it is smaller than a prescribed significance level. However, as you've learned in Stat 200 the method applies only when certain assumptions are satisfied. One of the assumptions is homoscedasticity.


  11. Make a scatter plot of the residuals vs i of the linear model above. What do you see?





  12.  Tries 0/1

    Transformation of Variable

    It can be shown that under H0, the expeacted value of psim[i] is E(psim[i]) = p(i) and SD(psim[i]) = σ(i) = √p(i) [1-p(i)] / Nsim , where Nsim=105 is the number of repeated experiments in the simulation and p(i) = 1/(102-i). Introduce a new variable y:

    y(i) = [psim[i] - p(i)] / σ(i)

    Calculate y(i) for i from 2 to 100 and store the result to a numeric vector of length 99. By construction, under H0 the expected value of y(i) is 0 and the SD is 1 independent of i. Under H0, y(i) should be just a random noise.

  13. Make a scatter plot of y(i) verus i. What do you see?





  14.  Tries 0/1

  15. Fit a linear model predicting y from i. Then make a residual plot. What do you see?





  16.  Tries 0/1

  17. What are the p-values of the intercept and slope (round to 2 decimal places). Use them to determine if the result is consistent with H0. Recall that the expected values of intercept and slope are 0 under H0.
  18. P-value of the intercept =

    P-value of the slope =

    Conclusion: reject H0 at 5% significance level?


     Tries 0/5