How to Use map() in R
Introduction
The map() function from the purrr package applies a function to each element of a list or vector and returns the results as a list. It’s the tidyverse replacement for lapply() with a more consistent syntax and better integration with pipes.
When to use map(): - Applying the same operation to multiple elements - Reading multiple files into a list - Running the same model on different subsets of data - Extracting elements from nested lists
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
We want to calculate summary statistics for multiple numeric columns in our dataset without writing repetitive code.
Apply a function to each element
Use map() to apply a function to each element of a list:
# Create a list of numeric vectors
numbers <- list(
a = c(1, 2, 3, 4, 5),
b = c(10, 20, 30),
c = c(100, 200, 300, 400)
)
# Calculate mean of each vector
map(numbers, mean)The result is a list with the mean of each vector.
Using anonymous functions
For custom operations, use anonymous functions with the \(x) syntax:
# Calculate range (max - min) for each vector
map(numbers, \(x) max(x) - min(x))The tilde shorthand
Purrr also supports the formula shorthand ~ where .x represents each element:
# Same as above using formula syntax
map(numbers, ~ max(.x) - min(.x))Example 2: Practical Application
The Problem
We have the penguins dataset and want to calculate summary statistics for each numeric column, then fit a linear model for each species.
Summarize multiple columns
Select numeric columns and apply multiple summary functions:
# Get numeric columns
penguin_nums <- penguins |>
select(where(is.numeric)) |>
select(-year)
# Calculate mean for each column (removing NAs)
map(penguin_nums, \(x) mean(x, na.rm = TRUE))Split-apply-combine with map
Split data by groups and apply a function to each:
# Split penguins by species
by_species <- penguins |>
drop_na() |>
split(~species)
# Fit a linear model for each species
models <- map(by_species, \(df) {
lm(body_mass_g ~ flipper_length_mm, data = df)
})
# Extract R-squared from each model
map(models, \(m) summary(m)$r.squared)Read multiple files
A common use case is reading multiple CSV files:
# Example: Read all CSV files in a directory
# files <- list.files("data/", pattern = "*.csv", full.names = TRUE)
# all_data <- map(files, read_csv)Example 3: Working with Nested Data
Extract elements from nested lists
Use map() to extract specific elements:
# Create nested data
nested_penguins <- penguins |>
drop_na() |>
group_by(species) |>
nest()
# nested_penguins now has a 'data' column containing tibbles
# Count rows in each nested tibble
nested_penguins |>
mutate(n_rows = map(data, nrow))Combine with mutate for powerful workflows
# Fit models and extract coefficients
nested_penguins |>
mutate(
model = map(data, \(df) lm(body_mass_g ~ flipper_length_mm, data = df)),
slope = map(model, \(m) coef(m)[2])
)The map() Family
map() always returns a list. Use typed variants for specific output types:
| Function | Returns | Example |
|---|---|---|
map() |
list | map(x, mean) |
map_dbl() |
numeric vector | map_dbl(x, mean) |
map_chr() |
character vector | map_chr(x, class) |
map_int() |
integer vector | map_int(x, length) |
map_lgl() |
logical vector | map_lgl(x, is.numeric) |
map_dfr() |
data frame (row-bind) | map_dfr(x, as_tibble) |
Base R Comparison
Why use map() instead of lapply()?
| Feature | lapply() | map() |
|---|---|---|
| Syntax | lapply(x, f) |
map(x, f) |
| Shorthand | None | ~ .x + 1 or \(x) x + 1 |
| Typed variants | No | map_dbl(), map_chr(), etc. |
| Pipe-friendly | Less | Yes |
| Extract by name | lapply(x, "[[", "name") |
map(x, "name") |
# Base R approach
lapply(1:3, function(x) x^2)
# purrr - cleaner syntax
map(1:3, \(x) x^2)Extracting elements by name
One of purrr’s best features - simple element extraction:
nested <- list(
a = list(name = "Alice", score = 95),
b = list(name = "Bob", score = 87),
c = list(name = "Carol", score = 92)
)
# Base R - verbose
lapply(nested, function(x) x$name)
# purrr - just pass the name
map(nested, "name")
# Or use pluck syntax for deeper nesting
map(nested, list("name", 1)) # First character of namePerformance Considerations
For simple operations on large vectors, vectorized base R is faster:
x <- 1:1e6
# Vectorized: fastest (use this!)
x^2
# map: adds overhead
map_dbl(x, \(i) i^2)When to use map(): - Working with lists (not atomic vectors) - Complex operations that aren’t vectorized - Readability matters more than microseconds - Chaining with other tidyverse functions - Operations that return different types/lengths per element
When NOT to use map(): - Simple math on vectors (use vectorized operations) - When lapply() is already working fine - Tight loops where nanoseconds matter
Common Mistakes
1. Forgetting that map() returns a list
# This returns a list, not a vector
result <- map(1:3, \(x) x^2)
class(result) # "list"
# Use map_dbl() for a numeric vector
result <- map_dbl(1:3, \(x) x^2)
class(result) # "numeric"2. Not handling errors in iteration
If one element causes an error, the whole map fails. Use possibly() or safely():
# This will error if any element fails
# map(list("1", "a", "3"), as.numeric)
# Wrap with possibly() to handle errors gracefully
safe_numeric <- possibly(as.numeric, otherwise = NA)
map(list("1", "a", "3"), safe_numeric)3. Modifying in place vs returning values
map() is for transformations that return values. For side effects (like printing or writing files), use walk():
# For side effects, use walk() instead
walk(1:3, print)Summary
- Use
map()to apply a function to each element of a list/vector - Use
\(x)or~syntax for anonymous functions - Use typed variants (
map_dbl,map_chr, etc.) for specific output types - Combine with
nest()andmutate()for powerful grouped operations - Use
possibly()orsafely()for error handling