How to Pearson correlation in R

statistics

Pearson correlation

Learn how to perform pearson correlation in R. Step-by-step statistical tutorial with examples.

Published

February 21, 2026

Introduction

Pearson correlation measures the linear relationship between two continuous variables, producing a value between -1 and 1. Use it when you want to understand how strongly two numeric variables move together in a linear fashion.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to examine if there’s a relationship between penguin flipper length and body mass. This will help us understand basic penguin anatomy relationships.

Step 1: Load and examine the data

First, let’s look at our penguin dataset structure.

data(penguins)
penguins |>
  select(flipper_length_mm, body_mass_g) |>
  head()

This shows us the first few rows of our two variables of interest.

Step 2: Calculate basic correlation

Now we’ll compute the Pearson correlation coefficient.

correlation <- cor(penguins$flipper_length_mm, 
                   penguins$body_mass_g, 
                   use = "complete.obs")
print(correlation)

The use = "complete.obs" parameter handles missing values by excluding incomplete pairs.

Step 3: Create a visualization

Let’s visualize this relationship with a scatter plot.

penguins |>
  filter(!is.na(flipper_length_mm), !is.na(body_mass_g)) |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm") +
  labs(title = "Penguin Flipper Length vs Body Mass")

Scatter plot in R showing Pearson correlation between penguin flipper length and body mass with a fitted linear regression line created in ggplot2

This plot confirms the strong positive correlation we calculated numerically.

Example 2: Practical Application

The Problem

A marine biologist wants to analyze correlations between multiple penguin measurements across different species. They need to test statistical significance and handle different groups properly.

Step 1: Perform correlation test

We’ll use cor.test() to get statistical significance along with the correlation.

correlation_test <- cor.test(penguins$flipper_length_mm,
                            penguins$body_mass_g,
                            method = "pearson")
print(correlation_test)

This provides the correlation coefficient, confidence interval, and p-value for hypothesis testing.

Step 2: Calculate correlation matrix

Let’s examine correlations between multiple numeric variables simultaneously.

numeric_vars <- penguins |>
  select(bill_length_mm, bill_depth_mm, 
         flipper_length_mm, body_mass_g)

correlation_matrix <- cor(numeric_vars, use = "complete.obs")
round(correlation_matrix, 3)

The correlation matrix shows relationships between all pairs of variables in a single table.

Step 3: Group by species

Now we’ll calculate correlations separately for each penguin species.

species_correlations <- penguins |>
  group_by(species) |>
  summarise(
    correlation = cor(flipper_length_mm, body_mass_g, 
                     use = "complete.obs"),
    .groups = "drop"
  )
print(species_correlations)

This reveals how the flipper-mass relationship varies across different penguin species.

Step 4: Visualize by groups

Finally, let’s create a grouped visualization to see these relationships.

penguins |>
  filter(!is.na(flipper_length_mm), !is.na(body_mass_g), !is.na(species)) |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g,
             color = species)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Flipper Length vs Body Mass by Species")

Grouped scatter plot in R visualizing Pearson correlation between flipper length and body mass for Adelie, Chinstrap, and Gentoo penguin species with per-group regression lines in ggplot2

The different colored trend lines show how correlation strength varies between Adelie, Chinstrap, and Gentoo penguins.

Summary

• Use cor() for simple correlation coefficients and cor.test() when you need statistical significance testing • Always include use = "complete.obs" to properly handle missing values in your data • Correlation matrices with cor() efficiently compare multiple variables simultaneously
• Group-wise correlations using group_by() reveal how relationships differ across categories • Scatter plots with trend lines provide essential visual confirmation of your correlation calculations —

--- title: "How to Pearson correlation in R" description: "Learn how to perform pearson correlation in R. Step-by-step statistical tutorial with examples." date: 2026-02-21 categories: ['statistics', 'Pearson correlation'] format: html: code-fold: false code-tools: true --- ## Introduction Pearson correlation measures the linear relationship between two continuous variables, producing a value between -1 and 1. Use it when you want to understand how strongly two numeric variables move together in a linear fashion. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage ### The Problem We want to examine if there's a relationship between penguin flipper length and body mass. This will help us understand basic penguin anatomy relationships. ### Step 1: Load and examine the data First, let's look at our penguin dataset structure. ```r data(penguins) penguins |> select(flipper_length_mm, body_mass_g) |> head() ``` This shows us the first few rows of our two variables of interest. ### Step 2: Calculate basic correlation Now we'll compute the Pearson correlation coefficient. ```r correlation <- cor(penguins$flipper_length_mm, penguins$body_mass_g, use = "complete.obs") print(correlation) ``` The `use = "complete.obs"` parameter handles missing values by excluding incomplete pairs. ### Step 3: Create a visualization Let's visualize this relationship with a scatter plot. ```r penguins |> filter(!is.na(flipper_length_mm), !is.na(body_mass_g)) |> ggplot(aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(alpha = 0.7) + geom_smooth(method = "lm") + labs(title = "Penguin Flipper Length vs Body Mass") ``` ![Scatter plot in R showing Pearson correlation between penguin flipper length and body mass with a fitted linear regression line created in ggplot2](/images/statistics/pearson-correlation-in-r-scatter-plot-with-fit-line-ggplot.png) This plot confirms the strong positive correlation we calculated numerically. ## Example 2: Practical Application ### The Problem A marine biologist wants to analyze correlations between multiple penguin measurements across different species. They need to test statistical significance and handle different groups properly. ### Step 1: Perform correlation test We'll use `cor.test()` to get statistical significance along with the correlation. ```r correlation_test <- cor.test(penguins$flipper_length_mm, penguins$body_mass_g, method = "pearson") print(correlation_test) ``` This provides the correlation coefficient, confidence interval, and p-value for hypothesis testing. ### Step 2: Calculate correlation matrix Let's examine correlations between multiple numeric variables simultaneously. ```r numeric_vars <- penguins |> select(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) correlation_matrix <- cor(numeric_vars, use = "complete.obs") round(correlation_matrix, 3) ``` The correlation matrix shows relationships between all pairs of variables in a single table. ### Step 3: Group by species Now we'll calculate correlations separately for each penguin species. ```r species_correlations <- penguins |> group_by(species) |> summarise( correlation = cor(flipper_length_mm, body_mass_g, use = "complete.obs"), .groups = "drop" ) print(species_correlations) ``` This reveals how the flipper-mass relationship varies across different penguin species. ### Step 4: Visualize by groups Finally, let's create a grouped visualization to see these relationships. ```r penguins |> filter(!is.na(flipper_length_mm), !is.na(body_mass_g), !is.na(species)) |> ggplot(aes(x = flipper_length_mm, y = body_mass_g, color = species)) + geom_point(alpha = 0.7) + geom_smooth(method = "lm", se = FALSE) + labs(title = "Flipper Length vs Body Mass by Species") ``` ![Grouped scatter plot in R visualizing Pearson correlation between flipper length and body mass for Adelie, Chinstrap, and Gentoo penguin species with per-group regression lines in ggplot2](/images/statistics/pearson-correlation-in-r-scatter-by-species-ggplot.png) The different colored trend lines show how correlation strength varies between Adelie, Chinstrap, and Gentoo penguins. ## Summary • Use `cor()` for simple correlation coefficients and `cor.test()` when you need statistical significance testing • Always include `use = "complete.obs"` to properly handle missing values in your data • Correlation matrices with `cor()` efficiently compare multiple variables simultaneously • Group-wise correlations using [`group_by()`](/dplyr/how-to-use-groupby-in-r.html) reveal how relationships differ across categories • Scatter plots with trend lines provide essential visual confirmation of your correlation calculations --- ## Related Posts - [How to Compute Pearson Correlation of Multiple Variables](/statistics/compute-pearson-correlation-of-multiple-variables.html) - [Computing Correlation with R](/statistics/computing-correlation-with-r.html) - [How to correlation matrix in R](/statistics/how-to-correlation-matrix-in-r.html) - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [How to replace NA in a column with specific value](/dplyr/how-to-replace-na-in-a-column-with-specific-value.html)