Correlation
Analysis
When
looking at the relationship between two variables taken from objects in
a sample, correlation
is the appropriate approach when we are interested in the strength
of the association between the variables but
cannot
assume causality (that is, we have two potentially
interdependent
variables, not one independent and one depent
variable). The question addressed by correlation analysis is the
extent to which two variables covary.
In graphical terms, this amounts to asking how closely points on a scatterplot
fall to an imaginary line drawn through the long axis. Of course,
it is important to remember that we are not saying anything about the line
itself (slope, intercept, etc.), just where the points lie in relation
to that line.
|
|
Pearson Correlation Coefficient The Pearson correlation coeffcient is a parametric statistic, which assumes that (1) a random sample, (2) both variables are interval or ratio, (3) both variables are more or less normally distributed, and (4) any relationship that exists is linear. To calculate the Pearson correlation, we must first calculate the covariance, or sum of the products of the deviations of two variables from their respective means. The covariance (cov) is calculated as cov(X1,X2) = 1/(n-1) * SUM ((X1i - X1 bar)(X2i - X2 bar)) While the covariance shows the same tendencies as the correlation, its actual value is dependent on the original units (so cov ranges from negative to positive infinity). We would like to standardize these covariances, so we can compare variables measured on different scales and compute correlations among pairs of variables measured in different scales. To do this, we divide the covariance by the standard deviations of the variables to generate the Pearson correlation coefficient (rp), as rp = cov (X1,X2) / (SX1 * SX2) It is important to remember that r is not a test of significance, just a measure of the degree of association. Click here to see an example calculation. |
|
Spearman Correlation Coefficient The Spearman correlation is nonparametric, and is also known as a rank correlation, as it is conducted on the ranks of the observations for data that are at least ordinal. Specifically, this correlation evaluates the differences in ranks of an object that is ranked for two different variables. So, the sample of objects is ranked twice (once for each of the variables for which the correlation is to be assessed), and the difference in the ranks is calculated for each object. The Spearman correlation from these data is given by rs = 1 - ((6*SUMd2) / n*(n2 - 1)) where d2 = (RX1 - RX2)2. If the rank order is the same for both variables, then the correlation is perfect (1.0 or -1.0). As with the Pearson correlation, we do not know from the value of r alone whether the observed correlation is significant. For either type of correlation, we can test the null hypothesis that the correlation is not significant by calculating t = r * SQRT((n - 2) / (1 - r2) and comparing this to a critical t at the 0.05 level with n - 2 degrees of freedom (n - 2 since one degree of freedom is lost for each variable). Click
here
to see an example calculation.
|