1. LON-CAPA Logo
  2. Help
  3. Log In
 

Browsing resource, all submissions are temporary.


Bank Marketing

Enterprises often promote new products/services to their customers through campaigns. A commonly used marketing campaign is phone call. Naturally, entrepreneurs want to know how effective these marketing campaigns are.

In this problem, you will analyze a data set collected from a Portuguese bank. It is related to phone campaigns that occurred between May 2008 and November 2010. During these phone campaigns, an attractive long-term deposit application, with good interest rates, was offered.

The bank data are downloaded from the UC Irvine Machine Learning Repository. A copy of the data is also uploaded to our website here. Note that in this csv file columns are separted by semi-colons (;) instead of commas. You need to read the help page in ?read.table to figure out how to load this file properly to R.

A description of the data and column variables can be found on this webpage. The last column, named y, is a binary factor variable (yes/no) indicating whether the client subscribed the long-term deposit. For convenience of later calculations, you may want to follow the same procedure as in this week's notes to convert "yes" to 1 and "no" to 0.

a. The column 'job' encodes the job types of the clients. Use table() function to create a contingency table of the two variables 'job' and 'y'. Perform a Chi-square independence test to determine whether or not they open an account depends on job category. The warning message in the output is due to the small values in some cells in the contingency table and the normal approximation breaks down there. We ignore the warning here since we just want to get a sense of the possible dependence of y vs job types.

P-value of the chi-square test is 5%.

What do you conclude?



 Tries [_1]

b. Which two job types have the highest proportions of customers subscribing to the long-term deposit? (select two)
services
blue-collar
student
entrepreneur
technician
unemployed
housemaid
self-employed
admin.
management
unknown
retired

 Tries [_1]

c. The column 'campaign' records the number of contacts (phone calls) performed for the client. Fit a logistic regression model predicting y from the number of contacts. Enter the intercept and slope to 4 significant figures.

Intercept =           slope =

 Tries [_1]

d. Use the result in (c) to predict the probability that clients will subscribe the long-term deposit if they have been contacted 2 times and 35 times. Enter the answers to 2 significant figures.

P(2 times) =           P(35 times) =

 Tries [_1]

e. From the coefficients obtained in (c), what can you say about the predicted probability?



 Tries [_1]