How to use fill() in R

tidyr

fill()

Learn how to use fill() in R with practical examples. Step-by-step guide with code you can copy and run immediately.

Published

February 21, 2026

Introduction

The tidyr::fill() function fills missing values in specified columns by carrying forward the last non-missing value (or backward from the next non-missing value). This function is particularly useful when working with datasets that have implicit missing values or when previous observations should persist until a new value is recorded.

You would typically use fill() when dealing with data entry formats where values are only recorded when they change, survey data with repeated measures, or time series data where you need to propagate values forward or backward to handle gaps.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

Let’s start with a simple dataset that has missing values to demonstrate the basic functionality:

# Create sample data with missing values
penguin_summary <- penguins |>
  select(species, island, body_mass_g) |>
  slice(1:8) |>
  mutate(
    species = ifelse(row_number() %in% c(2, 3, 5), NA, species),
    island = ifelse(row_number() %in% c(3, 4), NA, island)
  )

# Fill missing values with previous non-missing values
penguin_summary |>
  fill(species, island)

By default, fill() carries the last observation forward (LOCF). In this example, missing values in the species and island columns are replaced with the most recent non-missing value from above. This is particularly useful when dealing with data where values are only recorded when they change, making the dataset more complete for analysis.

You can also specify the direction of filling using the .direction parameter. Use .direction = "up" to fill from the next non-missing value (NOCB - Next Observation Carried Backward):

penguin_summary |>
  fill(species, island, .direction = "up")

Example 2: Practical Application

A common real-world scenario involves time series or grouped data where you want to fill missing values within specific groups. Let’s create a more complex example using penguin measurements over time:

# Simulate a monitoring dataset with missing values
penguin_monitoring <- penguins |>
  filter(!is.na(body_mass_g)) |>
  select(species, island, year, body_mass_g, bill_length_mm) |>
  group_by(species, island) |>
  slice_head(n = 4) |>
  ungroup() |>
  mutate(
    # Simulate missing measurements
    body_mass_g = ifelse(row_number() %% 3 == 0, NA, body_mass_g),
    bill_length_mm = ifelse(row_number() %% 4 == 0, NA, bill_length_mm)
  ) |>
  arrange(species, island, year)

# Fill missing values within each species-island group
penguin_monitoring |>
  group_by(species, island) |>
  fill(body_mass_g, bill_length_mm, .direction = "downup") |>
  ungroup()

In this example, we use .direction = "downup" which first fills down (forward), then fills up (backward) to handle missing values at the beginning of groups. The group_by() ensures that filling only occurs within each species-island combination, preventing values from bleeding across different groups.

For more sophisticated filling, you can combine fill() with other tidyr functions:

# Advanced example: fill and then handle remaining NAs
penguin_monitoring |>
  group_by(species, island) |>
  fill(body_mass_g, bill_length_mm, .direction = "downup") |>
  mutate(
    # Replace any remaining NAs with group means
    body_mass_g = coalesce(body_mass_g, mean(body_mass_g, na.rm = TRUE)),
    bill_length_mm = coalesce(bill_length_mm, mean(bill_length_mm, na.rm = TRUE))
  ) |>
  ungroup()

Summary

tidyr::fill() propagates non-missing values to fill gaps in your data, with options to fill down (default), up, or both directions
Always consider using group_by() when you want to fill within specific categories to prevent values from bleeding across different groups
Combine fill() with other functions like coalesce() or summary statistics to create robust missing value handling strategies for your data analysis workflow

--- title: "How to use fill() in R" description: "Learn how to use fill() in R with practical examples. Step-by-step guide with code you can copy and run immediately." date: 2026-02-21 categories: ["tidyr", "fill()"] format: html: code-fold: false code-tools: true --- ## Introduction The `tidyr::fill()` function fills missing values in specified columns by carrying forward the last non-missing value (or backward from the next non-missing value). This function is particularly useful when working with datasets that have implicit missing values or when previous observations should persist until a new value is recorded. You would typically use `fill()` when dealing with data entry formats where values are only recorded when they change, survey data with repeated measures, or time series data where you need to propagate values forward or backward to handle gaps. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage Let's start with a simple dataset that has missing values to demonstrate the basic functionality: ```r # Create sample data with missing values penguin_summary <- penguins |> select(species, island, body_mass_g) |> slice(1:8) |> mutate( species = ifelse(row_number() %in% c(2, 3, 5), NA, species), island = ifelse(row_number() %in% c(3, 4), NA, island) ) # Fill missing values with previous non-missing values penguin_summary |> fill(species, island) ``` By default, `fill()` carries the last observation forward (LOCF). In this example, missing values in the `species` and `island` columns are replaced with the most recent non-missing value from above. This is particularly useful when dealing with data where values are only recorded when they change, making the dataset more complete for analysis. You can also specify the direction of filling using the `.direction` parameter. Use `.direction = "up"` to fill from the next non-missing value (NOCB - Next Observation Carried Backward): ```r penguin_summary |> fill(species, island, .direction = "up") ``` ## Example 2: Practical Application A common real-world scenario involves time series or grouped data where you want to fill missing values within specific groups. Let's create a more complex example using penguin measurements over time: ```r # Simulate a monitoring dataset with missing values penguin_monitoring <- penguins |> filter(!is.na(body_mass_g)) |> select(species, island, year, body_mass_g, bill_length_mm) |> group_by(species, island) |> slice_head(n = 4) |> ungroup() |> mutate( # Simulate missing measurements body_mass_g = ifelse(row_number() %% 3 == 0, NA, body_mass_g), bill_length_mm = ifelse(row_number() %% 4 == 0, NA, bill_length_mm) ) |> arrange(species, island, year) # Fill missing values within each species-island group penguin_monitoring |> group_by(species, island) |> fill(body_mass_g, bill_length_mm, .direction = "downup") |> ungroup() ``` In this example, we use `.direction = "downup"` which first fills down (forward), then fills up (backward) to handle missing values at the beginning of groups. The [`group_by()`](/dplyr/how-to-use-groupby-in-r.html) ensures that filling only occurs within each species-island combination, preventing values from bleeding across different groups. For more sophisticated filling, you can combine `fill()` with other tidyr functions: ```r # Advanced example: fill and then handle remaining NAs penguin_monitoring |> group_by(species, island) |> fill(body_mass_g, bill_length_mm, .direction = "downup") |> mutate( # Replace any remaining NAs with group means body_mass_g = coalesce(body_mass_g, mean(body_mass_g, na.rm = TRUE)), bill_length_mm = coalesce(bill_length_mm, mean(bill_length_mm, na.rm = TRUE)) ) |> ungroup() ``` ## Summary - `tidyr::fill()` propagates non-missing values to fill gaps in your data, with options to fill down (default), up, or both directions - Always consider using `group_by()` when you want to fill within specific categories to prevent values from bleeding across different groups - Combine `fill()` with other functions like `coalesce()` or summary statistics to create robust missing value handling strategies for your data analysis workflow --- ## Related Posts - [How to use separate() in R](/tidyr/how-to-use-separate-in-r.html) - [How to use separate_wider_delim() in R](/tidyr/how-to-use-separatewiderdelim-in-r.html) - [How to use replace_na() in R](/tidyr/how-to-use-replacena-in-r.html) - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [How to use mutate() in R](/dplyr/how-to-use-mutate-in-r.html)

Introduction

Getting Started

Example 1: Basic Usage

Example 2: Practical Application

Summary

Combine fill() with other functions like coalesce() or summary statistics to create robust missing value handling strategies for your data analysis workflow

Related Posts

Combine `fill()` with other functions like `coalesce()` or summary statistics to create robust missing value handling strategies for your data analysis workflow