1. LON-CAPA Logo
  2. Help
  3. Log In
 

Browsing resource, all submissions are temporary.


Transformation of Variables

Normal distribution is widely used in statistics to construct confidence intervals and calculate p-values. However, not all distributions are normal. You have seen from one of Week 4's exercises that the distribution of the correlation coefficient r is not normal. It is possible to transform the variable to make the distribution closer to a normal curve. To demonstrate this, we have created a data set for the distribution of the correlation coefficient r from that Week 4 problem.

The data is generated by a similar method as you did in Week 4: perform 10,000 simulations of choosing 25 students of the Stat 100 final score-ExamAve data and calculate the sample r. We use a smaller sample size 25 instead of 100 in order to make the resulting sample r distribution further from the normal curve. The data can be loaded to R (after saving it in your R's work space) by the command

rdat <- read.csv("rdensity.csv")

There is only one column, labelled "r". It is more convenient to copy the column to a numeric vector 'r' using the command

r <- rdat$r

You can make a density plot by the command

plot(density(r))

You should see a plot similar to the following. It is clear that the distribution of r is left-skewed.

density plot of r

We would like to transform r to make the distribution closer to a normal curve. There are several candidates: r2, r3, r4, er, √, 1/r, log(r), ... Here log means the natural log (base e). Try the following plots:

plot(density(r^2)) 
plot(density(r^3)) 
plot(density(r^4)) 
plot(density(sqrt(r)))
plot(density(exp(r)))
plot(density(log(r)))
plot(density(1/r))

a. Which of the following transformations make the resulting distribution curve substantially closer to a normal curve? (select all that apply)
log(r)

1/r
r3
r4

 Tries 0/5

A lazy dude doesn't want to type the plot() command every time he wants to explore a new transfromation. He creates the following function:

t <- function(r,f,...) {
   plot(density(f(r,...)))
}

Note that ... is a special function argument. If you forget about it, review Section 15.5 of the textbook.


b. Which of the following function calls is equivalent to the command plot(density(log(r)))?





 Tries 0/2

c. In order to use the t() function to plot power-law transformations, the lazy dude creates the following function:

p <- function(x,n) {
   x^n
}

Which of the following function calls is equivalent to plot(density(1/r))?





 Tries 0/2

d. Another lazy dude scoffs at this t() function in handing the power-law transformations and its inability in dealing with more general transformations such as 1/(1+r). He creates the following t() function that is both simpler and able to handle more general transformations:

t <- function(x) {
   plot(density(x))
}

With this new t() function, which of the following function calls is equivalent to plot(density(1/(1+r)))?





 Tries 0/2