How to use replace_na() in R

tidyr
replace_na()
Learn how to use replace_na() in R with practical examples. Step-by-step guide with code you can copy and run immediately.
Published

February 21, 2026

Introduction

The replace_na() function from the tidyr package is essential for handling missing values in your datasets. It allows you to replace NA values with specified replacement values across one or multiple columns. This function is particularly useful during data cleaning and preparation stages of your analysis.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

Let’s work with the penguins dataset, which contains some missing values in the bill_length_mm column. We need to replace these NA values with a meaningful substitute like the mean of the available data.

Step 1: Examine the data

First, let’s look at our dataset to identify missing values.

penguins |>
  select(species, bill_length_mm, bill_depth_mm) |>
  head(10)

This shows us the first 10 rows with some key columns, including any NA values present.

Step 2: Replace NA values in a single column

We’ll replace missing bill lengths with the mean value.

penguins_clean <- penguins |>
  mutate(bill_length_mm = replace_na(bill_length_mm, 
                                   mean(bill_length_mm, na.rm = TRUE)))

This replaces all NA values in bill_length_mm with the calculated mean of the non-missing values.

Step 3: Verify the replacement

Let’s check that our replacement worked correctly.

penguins_clean |>
  filter(is.na(penguins$bill_length_mm)) |>
  select(species, bill_length_mm) |>
  head(5)

This shows the rows that originally had NA values, now filled with the mean value.

Example 2: Practical Application

The Problem

In real-world scenarios, you often need to replace NA values across multiple columns with different replacement strategies. For instance, you might want to use mean for numeric columns and “Unknown” for categorical columns in a comprehensive data cleaning process.

Step 1: Create a dataset with multiple NA types

Let’s introduce some missing values to demonstrate multiple column replacement.

messy_penguins <- penguins |>
  mutate(
    species = ifelse(row_number() %in% c(5, 15, 25), NA, species),
    body_mass_g = ifelse(row_number() %in% c(3, 13, 23), NA, body_mass_g)
  )

This creates artificial missing values in both categorical and numeric columns for demonstration.

Step 2: Replace multiple columns with different strategies

Now we’ll use replace_na() with a list to handle different columns appropriately.

clean_penguins <- messy_penguins |>
  replace_na(list(
    species = "Unknown",
    body_mass_g = 4200,
    bill_length_mm = 43.9
  ))

This replaces NA values with appropriate defaults: “Unknown” for species, and reasonable numeric values for the measurements.

Step 3: Verify multiple replacements

Let’s confirm our replacements worked across all specified columns.

clean_penguins |>
  filter(species == "Unknown" | 
         body_mass_g == 4200 | 
         bill_length_mm == 43.9) |>
  select(species, bill_length_mm, body_mass_g)

This shows all rows where our replacement values appear, confirming the function worked correctly.

Step 4: Compare before and after

Finally, let’s see the improvement in data completeness.

# Check NA counts before and after
messy_penguins |> summarise(across(everything(), ~sum(is.na(.))))
clean_penguins |> summarise(across(everything(), ~sum(is.na(.))))

This comparison reveals how many missing values were successfully replaced in each column.

Summary

  • replace_na() is the go-to function for replacing missing values in tidyr, working seamlessly with dplyr pipelines
  • Use it with a single value to replace NAs in one column, or with a named list for multiple columns simultaneously
  • Common replacement strategies include using means/medians for numeric data and “Unknown” or “Other” for categorical data
  • Always verify your replacements worked correctly by checking the modified dataset
  • The function integrates perfectly with modern pipe syntax |> for clean, readable data cleaning workflows