How to use expand() in R

tidyr

expand()

Learn how to use expand() in R with practical examples. Step-by-step guide with code you can copy and run immediately.

Published

February 21, 2026

Introduction

The tidyr::expand() function creates a data frame containing all possible combinations of the specified variables. It takes vectors or columns and generates every unique combination, creating a complete grid of possibilities. This function is particularly useful when you need to ensure your data includes all theoretical combinations of categorical variables, even if some combinations don’t exist in your original dataset.

You would use expand() when performing complete case analysis, filling in missing combinations for time series data, creating lookup tables, or preparing data for modeling where you need all possible factor combinations represented.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

Let’s start with a simple example using the penguins dataset to create all combinations of species and islands:

# Basic expansion of species and island combinations
penguins |>
  expand(species, island)

This creates a data frame with all 9 possible combinations of the 3 penguin species (Adelie, Chinstrap, Gentoo) and 3 islands (Biscoe, Dream, Torgersen). Notice that some combinations like “Chinstrap penguins on Biscoe island” might not exist in the original data, but expand() includes them anyway.

You can also expand with specific values rather than existing columns:

# Expand with custom values
expand(tibble(), 
       year = 2007:2009,
       month = 1:12)

This generates all combinations of years 2007-2009 with months 1-12, creating a complete time grid.

Example 2: Practical Application

A common real-world scenario is preparing data for analysis where you need complete cases. Let’s say we want to analyze penguin body mass across all species-island-year combinations, ensuring we account for missing combinations:

# Create complete grid and join with actual data
complete_penguin_grid <- penguins |>
  expand(species, island, year) |>
  left_join(
    penguins |>
      group_by(species, island, year) |>
      summarise(
        avg_body_mass = mean(body_mass_g, na.rm = TRUE),
        n_penguins = n(),
        .groups = "drop"
      ),
    by = c("species", "island", "year")
  ) |>
  mutate(
    avg_body_mass = ifelse(is.nan(avg_body_mass), NA, avg_body_mass),
    n_penguins = replace_na(n_penguins, 0)
  )

This workflow expands all possible combinations of species, island, and year, then joins the actual summarized data. Missing combinations get NA for average body mass and 0 for count, giving us a complete picture of data availability.

Another practical use is expanding nested data:

# Expand within groups
penguins |>
  group_by(species) |>
  expand(island, year = full_seq(year, 1)) |>
  ungroup()

This creates all island-year combinations within each species group, using full_seq() to ensure consecutive years are included even if missing from the original data.

You can also combine expand() with nesting() to preserve existing combinations while expanding others:

penguins |>
  expand(nesting(species, island), year = 2007:2009)

This maintains only the species-island combinations that actually exist in the data while expanding across all specified years.

Summary

expand() generates all possible combinations of specified variables, creating a complete grid that includes combinations not present in your original data
It’s invaluable for ensuring complete case analysis, preparing data for modeling, and identifying missing combinations in your dataset
Combine expand() with left_join() and other dplyr functions to create comprehensive analytical frameworks that account for all theoretical possibilities in your data structure

--- title: "How to use expand() in R" description: "Learn how to use expand() in R with practical examples. Step-by-step guide with code you can copy and run immediately." date: 2026-02-21 categories: ["tidyr", "expand()"] format: html: code-fold: false code-tools: true --- ## Introduction The `tidyr::expand()` function creates a data frame containing all possible combinations of the specified variables. It takes vectors or columns and generates every unique combination, creating a complete grid of possibilities. This function is particularly useful when you need to ensure your data includes all theoretical combinations of categorical variables, even if some combinations don't exist in your original dataset. You would use `expand()` when performing complete case analysis, filling in missing combinations for time series data, creating lookup tables, or preparing data for modeling where you need all possible factor combinations represented. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage Let's start with a simple example using the penguins dataset to create all combinations of species and islands: ```r # Basic expansion of species and island combinations penguins |> expand(species, island) ``` This creates a data frame with all 9 possible combinations of the 3 penguin species (Adelie, Chinstrap, Gentoo) and 3 islands (Biscoe, Dream, Torgersen). Notice that some combinations like "Chinstrap penguins on Biscoe island" might not exist in the original data, but `expand()` includes them anyway. You can also expand with specific values rather than existing columns: ```r # Expand with custom values expand(tibble(), year = 2007:2009, month = 1:12) ``` This generates all combinations of years 2007-2009 with months 1-12, creating a complete time grid. ## Example 2: Practical Application A common real-world scenario is preparing data for analysis where you need complete cases. Let's say we want to analyze penguin body mass across all species-island-year combinations, ensuring we account for missing combinations: ```r # Create complete grid and join with actual data complete_penguin_grid <- penguins |> expand(species, island, year) |> left_join( penguins |> group_by(species, island, year) |> summarise( avg_body_mass = mean(body_mass_g, na.rm = TRUE), n_penguins = n(), .groups = "drop" ), by = c("species", "island", "year") ) |> mutate( avg_body_mass = ifelse(is.nan(avg_body_mass), NA, avg_body_mass), n_penguins = replace_na(n_penguins, 0) ) ``` This workflow expands all possible combinations of species, island, and year, then joins the actual summarized data. Missing combinations get `NA` for average body mass and 0 for count, giving us a complete picture of data availability. Another practical use is expanding nested data: ```r # Expand within groups penguins |> group_by(species) |> expand(island, year = full_seq(year, 1)) |> ungroup() ``` This creates all island-year combinations within each species group, using `full_seq()` to ensure consecutive years are included even if missing from the original data. You can also combine `expand()` with `nesting()` to preserve existing combinations while expanding others: ```r penguins |> expand(nesting(species, island), year = 2007:2009) ``` This maintains only the species-island combinations that actually exist in the data while expanding across all specified years. ## Summary - `expand()` generates all possible combinations of specified variables, creating a complete grid that includes combinations not present in your original data - It's invaluable for ensuring complete case analysis, preparing data for modeling, and identifying missing combinations in your dataset - Combine `expand()` with `left_join()` and other dplyr functions to create comprehensive analytical frameworks that account for all theoretical possibilities in your data structure --- ## Related Posts - [How to use separate() in R](/tidyr/how-to-use-separate-in-r.html) - [How to use separate_wider_delim() in R](/tidyr/how-to-use-separatewiderdelim-in-r.html) - [How to use replace_na() in R](/tidyr/how-to-use-replacena-in-r.html) - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [How to use mutate() in R](/dplyr/how-to-use-mutate-in-r.html)

Introduction

Getting Started

Example 1: Basic Usage

Example 2: Practical Application

Summary

Combine expand() with left_join() and other dplyr functions to create comprehensive analytical frameworks that account for all theoretical possibilities in your data structure

Related Posts

Combine `expand()` with `left_join()` and other dplyr functions to create comprehensive analytical frameworks that account for all theoretical possibilities in your data structure