How to Use reduce() in R
Introduction
The reduce() function combines elements of a list into a single value by repeatedly applying a binary function. Starting with the first two elements, it applies the function, then applies the function to that result and the third element, and so on.
When to use reduce(): - Joining multiple data frames together - Combining multiple vectors with set operations - Building up complex results incrementally - Implementing cumulative operations
Getting Started
library(tidyverse)How reduce() Works
Basic concept
# reduce(list(a, b, c, d), f) is equivalent to:
# f(f(f(a, b), c), d)
# Example: sum a list of numbers
numbers <- list(1, 2, 3, 4, 5)
reduce(numbers, `+`) # ((((1+2)+3)+4)+5) = 15Step by step visualization
# Trace what happens
reduce(1:4, `+`, .init = 0)
# Step 1: 0 + 1 = 1
# Step 2: 1 + 2 = 3
# Step 3: 3 + 3 = 6
# Step 4: 6 + 4 = 10Example 1: Basic Operations
Multiply all elements
reduce(1:5, `*`) # 1*2*3*4*5 = 120Find the maximum
numbers <- list(3, 1, 4, 1, 5, 9, 2, 6)
reduce(numbers, max) # 9Concatenate strings
words <- list("Hello", " ", "World", "!")
reduce(words, paste0) # "Hello World!"Example 2: Join Multiple Data Frames
The Problem
You have multiple data frames that share a common key and need to join them all together.
# Create sample data frames
df1 <- tibble(id = 1:3, value1 = c("a", "b", "c"))
df2 <- tibble(id = 1:3, value2 = c(10, 20, 30))
df3 <- tibble(id = 1:3, value3 = c(TRUE, FALSE, TRUE))
# List of data frames
dfs <- list(df1, df2, df3)
# Join all together
reduce(dfs, left_join, by = "id")Practical example with real data
# Split data, process, and rejoin
penguins_split <- penguins |>
drop_na() |>
split(~species) |>
map(\(df) df |> mutate(processed = TRUE))
# Combine back into one data frame
reduce(penguins_split, bind_rows)Example 3: Set Operations
Find intersection of multiple vectors
sets <- list(
c(1, 2, 3, 4, 5),
c(2, 3, 4, 5, 6),
c(3, 4, 5, 6, 7)
)
# Find elements common to ALL sets
reduce(sets, intersect) # 3, 4, 5Find union of multiple vectors
reduce(sets, union) # 1, 2, 3, 4, 5, 6, 7Using .init for Starting Values
Why use .init?
The .init argument provides a starting value, which is useful when: - The list might be empty - You need a specific starting point
# Without .init, empty list causes error
# reduce(list(), `+`) # Error!
# With .init, empty list returns the initial value
reduce(list(), `+`, .init = 0) # Returns 0Building a data frame row by row
# Start with empty tibble and add rows
rows <- list(
tibble(x = 1, y = "a"),
tibble(x = 2, y = "b"),
tibble(x = 3, y = "c")
)
reduce(rows, bind_rows, .init = tibble())accumulate(): Keep Intermediate Results
While reduce() returns only the final result, accumulate() keeps all intermediate values:
# Sum with reduce - only final result
reduce(1:5, `+`) # 15
# Sum with accumulate - all intermediate results
accumulate(1:5, `+`) # 1, 3, 6, 10, 15Running totals
daily_sales <- c(100, 150, 200, 125, 175)
accumulate(daily_sales, `+`) # Cumulative sumBuild strings incrementally
words <- c("one", "two", "three")
accumulate(words, paste, sep = ", ")
# "one"
# "one, two"
# "one, two, three"reduce2(): Reduce with a Secondary Vector
When your binary function needs a third argument that varies with each step:
# Paste with different separators
reduce2(
.x = c("a", "b", "c"),
.y = c("-", "_"), # One fewer than .x
.f = \(acc, x, sep) paste0(acc, sep, x)
)
# "a-b_c"Practical example: weighted accumulation
values <- c(100, 50, 75)
weights <- c(0.8, 0.9) # Decay factors
reduce2(values, weights, \(acc, val, w) acc * w + val)
# Step 1: 100
# Step 2: 100 * 0.8 + 50 = 130
# Step 3: 130 * 0.9 + 75 = 192Base R Comparison
# Base R Reduce()
Reduce(`+`, 1:5) # 15
# purrr reduce() - same result
reduce(1:5, `+`) # 15
# purrr advantages:
reduce(1:5, `+`, .init = 100) # Initial value support
reduce_right(1:5, paste) # Right-to-left
accumulate(1:5, `+`) # Keep intermediate valuesThe purrr version has more features and a consistent API with other purrr functions.
When NOT to Use reduce()
Don’t use reduce for operations that have vectorized alternatives:
# Bad - unnecessarily slow
reduce(1:1000, `+`) # Don't do this
reduce(1:1000, `*`) # Or this
reduce(1:1000, max) # Or this
# Good - use vectorized functions
sum(1:1000) # Fast
prod(1:1000) # Fast
max(1:1000) # FastUse reduce() when: - Joining multiple data frames - Set operations across multiple vectors - No vectorized alternative exists - Building complex structures incrementally
reduce_right(): Start from the End
Process from right to left instead of left to right:
# Left-associative (default)
reduce(c("a", "b", "c"), paste) # "a b c"
# Right-associative
reduce_right(c("a", "b", "c"), paste) # "c b a"Practical use: nested function calls
# Create nested function call structure
fns <- list(sqrt, log, abs)
reduce(fns, \(x, f) f(x), .init = -4) # sqrt(log(abs(-4)))Common Mistakes
1. Forgetting that reduce needs at least 2 elements (without .init)
# Single element returns that element
reduce(list(5), `+`) # 5
# Empty list errors without .init
# reduce(list(), `+`) # Error
# Solution: use .init
reduce(list(), `+`, .init = 0) # 02. Wrong argument order in the function
# The accumulator comes first
reduce(1:3, \(acc, x) paste(acc, x)) # "1 2 3"
# Not the current element
reduce(1:3, \(x, acc) paste(acc, x)) # Wrong order!3. Using reduce when map is simpler
# If you don't need to combine results, use map
# Don't do this:
# reduce(1:3, \(acc, x) c(acc, x^2), .init = c())
# Do this instead:
map_dbl(1:3, \(x) x^2)Summary
| Function | Returns | Use Case |
|---|---|---|
reduce() |
single value | Combine all elements into one |
reduce_right() |
single value | Combine from right to left |
accumulate() |
vector | Keep all intermediate results |
accumulate_right() |
vector | Intermediate results, right to left |
reduce()combines list elements into a single value- Use
.initto handle empty lists and set starting values - Use
accumulate()when you need the intermediate steps - Common uses: joining data frames, set operations, building strings