Browsing resource, all submissions are temporary.
Enterprises often promote new products/services to their customers through campaigns. A commonly used marketing campaign is phone call. Naturally, entrepreneurs want to know how effective these marketing campaigns are.
In this problem, you will analyze a data set collected from a Portuguese bank. It is related to phone campaigns that occurred between May 2008 and November 2010. During these phone campaigns, an attractive long-term deposit application, with good interest rates, was offered.
The bank data are downloaded from the UC Irvine Machine Learning Repository. A copy of the data is also uploaded to our website here. Note that in this csv file columns are separted by semi-colons (;) instead of commas. You need to read the help page in ?read.table to figure out how to load this file properly to R.
?read.table
A description of the data and column variables can be found on this webpage. The last column, named y, is a binary factor variable (yes/no) indicating whether the client subscribed the long-term deposit. For convenience of later calculations, you may want to follow the same procedure as in this week's notes to convert "yes" to 1 and "no" to 0.
a. The column 'job' encodes the job types of the clients. Use table() function to create a contingency table of the two variables 'job' and 'y'. Perform a Chi-square independence test to determine whether or not they open an account depends on job category. The warning message in the output is due to the small values in some cells in the contingency table and the normal approximation breaks down there. We ignore the warning here since we just want to get a sense of the possible dependence of y vs job types.
table()
P-value of the chi-square test is less than greater than 5%.
What do you conclude? That all groups are significantly different from each other. That at least one group is significantly different from the others. That none of the groups is significantly different from each other.
b. Which two job types have the highest proportions of customers subscribing to the long-term deposit? (select two) technician retired management student blue-collar services unemployed unknown self-employed entrepreneur admin. housemaid
c. The column 'campaign' records the number of contacts (phone calls) performed for the client. Fit a logistic regression model predicting y from the number of contacts. Enter the intercept and slope to 4 significant figures.
Intercept = slope =
d. Use the result in (c) to predict the probability that clients will subscribe the long-term deposit if they have been contacted 26 times and 48 times. Enter the answers to 2 significant figures.
P(26 times) = P(48 times) =
e. From the coefficients obtained in (c), what can you say about the predicted probability? The more times a client has been contacted, the less likely the client will subscribe the long-term deposit. The more times a client has been contacted, the more likely the client will subscribe the long-term deposit. The predicted probability can increase or decrease with the number of contacts, so it is impossible to tell from the coefficients alone.