How to reshape data from wide to long with pivot_longer() in R
Introduction
The pivot_longer() function from the tidyr package transforms wide data into long format by gathering multiple columns into key-value pairs. This is essential for data analysis and visualization, as most R functions expect data in “tidy” format where each variable forms a column and each observation forms a row. Use pivot_longer() when you have data spread across multiple columns that should be gathered into a single variable.
Basic Setup
First, let’s load the required packages for data manipulation:
library(tidyverse)Simple Example: Basic Pivot
Let’s start with a simple dataset containing patient measurements at two time points:
df <- tibble(pt_id = 1:4,
h1 = c(80, 82, 85, 88),
h2 = c(100, 102, 105, 110))
dfThis creates a wide-format dataset where each patient has measurements in separate columns (h1 and h2).
Now we’ll pivot this data from wide to long format using pivot_longer():
df |>
pivot_longer(cols = h1:h2,
names_to = "measurement",
values_to = "value")The result transforms our 4-row dataset into an 8-row dataset where each measurement becomes a separate observation. The cols argument specifies which columns to pivot, names_to creates a new column with the original column names, and values_to stores the actual values.
Advanced Example: Separating Column Names
Often, column names contain multiple pieces of information. Here’s a dataset where column names include both measurement time and gender:
df <- tibble(pt_id = 1:4,
h1_f = c(80, 82, 85, 88),
h2_m = c(100, 102, 105, 110))
dfThis dataset has column names that combine measurement time (h1, h2) with gender information (f, m).
We can separate these components during the pivot operation:
df |>
pivot_longer(cols = h1_f:h2_m,
names_sep = "_",
names_to = c("measurement", "sex"),
values_to = "value")The names_sep argument tells pivot_longer() to split column names at the underscore, creating two new columns: “measurement” and “sex”. This creates a fully tidy dataset where each piece of information has its own column.
Summary
The pivot_longer() function is crucial for reshaping data from wide to long format. Use the basic form with cols, names_to, and values_to for simple pivoting, and add names_sep when column names contain multiple variables separated by a delimiter. This transformation is often the first step in creating tidy datasets ready for analysis and visualization.