Browsing resource, all submissions are temporary.

Soda

A soda company wanted to improve its soda drink. It was decided that a secret ingredient was to be added to the drink. To find out the right amount to add that would appeal to the most people, an experiment was conducted in which 10 different amounts of the ingredient were added, and 200 randomly selected people were asked to taste one of the 10 versions of the drink. The 200 people then reported whether it tasted better than, worse than or the same as the original drink. The resulting data can be downloaded here and then loaded to R using the command

soda <- read.csv("soda094.csv")

The data frame has two columns. The first column, named 'x', is the amount of the secret ingredient added to the drink (from 1 to 10). The second column, named 'y', is a 0/1 integer vector indicating if the person said the drink tasted better (y=1) or not better (y=0) than the original drink.

a. Fit a 10-box model predicting the probability P(y=1) from x (treating x as a factor variable). This is the fraction of people who said that the drink tasted better when x amount of the secret ingredient was added. What is the predicted probability P(y=1) for x = 5? Give your answer to 2 decimal places.

Tries 0/5

b. If you make a plot of the predicted probability from (a) as a function of x, you will see that the data cannot be fit nicely by an S-shaped curve. The probability is small for both small x and large x. It has a peak/peaks somewhere in the middle. Let's fit the ln(odds) by a quadratic function. Fit a logistic regression model predicting the probability P(y=1) from x and x² treating x as a continuous variable. Give your answers to 3 decimal places.

ln(odds) = + x + ( )x²

Tries 0/5

c. Use the model in (b) to predict P(y=1) for x=5. Give your answer to 2 decimal places.

Tries 0/5

d. The model in (b) predicts that P(y=1) increases from x=1, reaches a peak in the middle at x = x_optimal and then drops for x > x_optimal. This means that adding a small amount of the secret ingredient to the drink doesn't improve the taste, whereas adding too much of it doesn't help either. The optimal amount is given by x_optimal, at which many people agree that the drink tastes better. Calculate x_optimal, which maximizes ln(odds) for the model in (b). Note that maximizing ln(odds) is the same as maximizing the probability P(y=1). Give your answer to 1 decimal place.

You can do the calculation analytically (using algebra or calculus) or numerically. The numerical calculation can be done using the brute-force approach as in Week 2's optimization problem. From the plot of P(y=1) vs x you should see that the maximum is somewhere between 5 and 6. To have a wider wiggle room, we say the maximum is between 4 and 7. Since you are asked to find the solution to 1 decimal place, you can simple use the predict() function to calculate ln(odds) for x = 4, 4.1, 4.2, ..., 7. The predict() function will return a vector of length 31 containing the ln(odds) for the 31 values of x. Search for the maximum value of the returned ln(odds) and find the corresponding value of x. (Alternatively, you can use the optimize() function to find the optimal x, but you will need to learn how to use it first.) Even if you think it's easier to solve the problem using an analytic method, you are still recommended to try the numerical approach and compare it to your analytic result since the numerical approach gives you another chance to practice R commands.

x_optimal =

Tries 0/5