Correlating variables together is method to test if there is a statistically significant relationship that exists between continuous variables. This is helpful if you want to know if a relationship exists and if we should investigate this relationship further with other statistical tests. There are many tests for correlating continuous variables together and in this guide we will be focusing on sample Pearson Correlation. Sample Pearson correlation is the most commonly used correlation test used.
Below is the formula for the sample Pearson Correlation test.
There are two lines of code above. The first, is cor (correlate) the variables wages and age from the dataset SLID. We are also specifying to use complete.obs (compete observations).
The second line, is to conduct a cor.test (correlation test) of the variables wages and age from the same SLID dataset. We then specify the method of the correlation test as pearson (sample Pearson Correlation test).
Output
A
wages age
wages 1.0000000 0.3614635
age 0.3614635 1.0000000
B
Pearson's product-moment correlation
data: SLID$wages and SLID$age
t = 24.959, df = 4145, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.3347088 0.3876359
sample estimates:
cor
0.3614635
A
The output chart above shows us the results from the sample Pearson correlation test between the variables wages and age. The rows are broken into two sections wages and age that show the correlation coefficient and the significance level when each variable is correlated, including itself.
Let’ focus on the variable wages, the sample Pearson correlation coefficient of 0.36146 is a positive moderate strong relationship when correlated with age. The coefficient value ranges from 0 to 1. When 0 there is no relationship that exists and 1 is a perfect relationship (this is rare and often a sign for concern) between the variables. The significance value is <0.0001, which is far below our 0.05 threshold. This indicates there is a significant relationship between wages and age in the dataset.