How to use fill() in R

tidyr
fill()
Learn how to use fill() in R with practical examples. Step-by-step guide with code you can copy and run immediately.
Published

February 21, 2026

Introduction

The tidyr::fill() function fills missing values in specified columns by carrying forward the last non-missing value (or backward from the next non-missing value). This function is particularly useful when working with datasets that have implicit missing values or when previous observations should persist until a new value is recorded.

You would typically use fill() when dealing with data entry formats where values are only recorded when they change, survey data with repeated measures, or time series data where you need to propagate values forward or backward to handle gaps.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

Let’s start with a simple dataset that has missing values to demonstrate the basic functionality:

# Create sample data with missing values
penguin_summary <- penguins |>
  select(species, island, body_mass_g) |>
  slice(1:8) |>
  mutate(
    species = ifelse(row_number() %in% c(2, 3, 5), NA, species),
    island = ifelse(row_number() %in% c(3, 4), NA, island)
  )

# Fill missing values with previous non-missing values
penguin_summary |>
  fill(species, island)

By default, fill() carries the last observation forward (LOCF). In this example, missing values in the species and island columns are replaced with the most recent non-missing value from above. This is particularly useful when dealing with data where values are only recorded when they change, making the dataset more complete for analysis.

You can also specify the direction of filling using the .direction parameter. Use .direction = "up" to fill from the next non-missing value (NOCB - Next Observation Carried Backward):

penguin_summary |>
  fill(species, island, .direction = "up")

Example 2: Practical Application

A common real-world scenario involves time series or grouped data where you want to fill missing values within specific groups. Let’s create a more complex example using penguin measurements over time:

# Simulate a monitoring dataset with missing values
penguin_monitoring <- penguins |>
  filter(!is.na(body_mass_g)) |>
  select(species, island, year, body_mass_g, bill_length_mm) |>
  group_by(species, island) |>
  slice_head(n = 4) |>
  ungroup() |>
  mutate(
    # Simulate missing measurements
    body_mass_g = ifelse(row_number() %% 3 == 0, NA, body_mass_g),
    bill_length_mm = ifelse(row_number() %% 4 == 0, NA, bill_length_mm)
  ) |>
  arrange(species, island, year)

# Fill missing values within each species-island group
penguin_monitoring |>
  group_by(species, island) |>
  fill(body_mass_g, bill_length_mm, .direction = "downup") |>
  ungroup()

In this example, we use .direction = "downup" which first fills down (forward), then fills up (backward) to handle missing values at the beginning of groups. The group_by() ensures that filling only occurs within each species-island combination, preventing values from bleeding across different groups.

For more sophisticated filling, you can combine fill() with other tidyr functions:

# Advanced example: fill and then handle remaining NAs
penguin_monitoring |>
  group_by(species, island) |>
  fill(body_mass_g, bill_length_mm, .direction = "downup") |>
  mutate(
    # Replace any remaining NAs with group means
    body_mass_g = coalesce(body_mass_g, mean(body_mass_g, na.rm = TRUE)),
    bill_length_mm = coalesce(bill_length_mm, mean(bill_length_mm, na.rm = TRUE))
  ) |>
  ungroup()

Summary

  • tidyr::fill() propagates non-missing values to fill gaps in your data, with options to fill down (default), up, or both directions
  • Always consider using group_by() when you want to fill within specific categories to prevent values from bleeding across different groups
  • Combine fill() with other functions like coalesce() or summary statistics to create robust missing value handling strategies for your data analysis workflow