How to use complete() in R

tidyr
complete()
Learn how to use complete() in R with practical examples. Step-by-step guide with code you can copy and run immediately.
Published

February 21, 2026

Introduction

The complete() function from tidyr helps you identify and fill in missing combinations of data in your dataset. It’s particularly useful when you have implicit missing values - combinations that should exist but are absent from your data, such as missing dates in time series or missing factor combinations in grouped data.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

Imagine you have survey data where not every participant answered every question, creating gaps in your dataset. You need to explicitly show these missing combinations to properly analyze response patterns.

Step 1: Create sample data with missing combinations

Let’s create a simple dataset that’s missing some obvious combinations.

survey_data <- tibble(
  participant = c(1, 1, 2, 2, 3),
  question = c("A", "B", "A", "C", "B"),
  response = c(4, 3, 5, 2, 4)
)
print(survey_data)

Notice that participant 3 only answered question B, and participant 2 never answered question B.

Step 2: Identify the complete structure

First, let’s see what combinations should exist in our data.

survey_data |>
  expand(participant, question)

This shows all possible combinations of participants and questions that could exist.

Step 3: Fill in missing combinations

Now we’ll use complete() to add the missing combinations to our original data.

survey_data |>
  complete(participant, question)

The missing combinations now appear with NA values for response, making the gaps in our data explicit.

Step 4: Fill missing values with defaults

We can provide default values for the missing combinations.

complete_survey <- survey_data |>
  complete(participant, question, fill = list(response = 0))
print(complete_survey)

Now all missing responses are filled with 0, indicating no response was given.

Example 2: Practical Application

The Problem

You’re analyzing penguin data and want to ensure you have entries for every species on every island, even if no penguins of that species were observed there. This is crucial for accurate statistical analysis and visualization.

Step 1: Examine the current data structure

Let’s look at the species-island combinations in our penguin data.

penguin_summary <- penguins |>
  filter(!is.na(species), !is.na(island)) |>
  count(species, island, name = "count") |>
  arrange(species, island)
print(penguin_summary)

We can see that not all species appear on all islands in our dataset.

Step 2: Complete all species-island combinations

Now let’s ensure every species has an entry for every island.

complete_penguins <- penguin_summary |>
  complete(species, island, fill = list(count = 0))
print(complete_penguins)

Missing combinations now show 0 count, indicating no observations of that species on that island.

Step 3: Create a complete time series example

Let’s create a monthly observation dataset with some missing months.

observations <- tibble(
  date = as.Date(c("2023-01-01", "2023-03-01", "2023-05-01", "2023-07-01")),
  species = c("Adelie", "Chinstrap", "Gentoo", "Adelie"),
  count = c(15, 8, 12, 20)
)

Notice we’re missing February, April, and June observations.

Step 4: Fill in missing months

We’ll complete the time series to include all months.

complete_observations <- observations |>
  complete(date = seq(min(date), max(date), by = "month"),
           fill = list(count = 0, species = "Unknown"))
print(complete_observations)

Now we have entries for every month, with appropriate defaults for missing data.

Summary

  • complete() makes implicit missing values explicit by adding rows for missing combinations
  • Use expand() first to preview what the complete structure would look like
  • The fill parameter lets you specify default values for missing combinations
  • It’s essential for time series analysis, ensuring continuous date sequences
  • Particularly useful for grouped data where you need consistent factor level combinations