Computing Correlation with R
In this tutorial, we will learn how to compute correlation between two numerical variables in R using cor() function available in base R.
Correlation between two numerical variables can range from -1 to +1, where -ve values suggest these two variables negatively correlated and positive value suggest that the variables are positively correlated. When there is no correlation between the two variables, the correlation value will be around zero.
First, we will compute correlation between two numerical vectors. Next, we will see two examples of how to compute correlation between two numerical variables present in a dataframe.
How to compute correlation between two numerical vectors
First, let us generate two numerical variables, x and y, using random numbers from normal distribution.
set.seed(21)
# generate x variable: random numbers from normal distribution
x %
drop_na()df %>% head()
# A tibble: 6 × 8
species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
1 Adelie Torge… 39.1 18.7 181 3750 male
2 Adelie Torge… 39.5 17.4 186 3800 fema…
3 Adelie Torge… 40.3 18 195 3250 fema…
4 Adelie Torge… 36.7 19.3 193 3450 fema…
5 Adelie Torge… 39.3 20.6 190 3650 male
6 Adelie Torge… 38.9 17.8 181 3625 fema…
# … with 1 more variable: year To compute correlation between body mass and flipper length, we will extract those two variables from the dataframe and save as new variables.
body_mass %
pull(body_mass_g)
flipper_length %
pull(flipper_length_mm)Now we can compute correlation as before using cor() function. In this example, these two variables are highly correlated with pearson correlation value of ~ 0.88.
cor(body_mass, flipper_length)
[1] 0.8729789Using base R notation, we can directly access a variable from a dataframe using $ symbol. In this second approach we compute correlation by getting the variable from the dataframe using $ symbol as shown below.
cor(df$body_mass_g, df$flipper_length_mm)
[1] 0.8729789