How to use replace_na() in R

tidyr

replace_na()

Learn how to use replace_na() in R with practical examples. Step-by-step guide with code you can copy and run immediately.

Published

February 21, 2026

Introduction

The replace_na() function from the tidyr package is essential for handling missing values in your datasets. It allows you to replace NA values with specified replacement values across one or multiple columns. This function is particularly useful during data cleaning and preparation stages of your analysis.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

Let’s work with the penguins dataset, which contains some missing values in the bill_length_mm column. We need to replace these NA values with a meaningful substitute like the mean of the available data.

Step 1: Examine the data

First, let’s look at our dataset to identify missing values.

penguins |>
  select(species, bill_length_mm, bill_depth_mm) |>
  head(10)

This shows us the first 10 rows with some key columns, including any NA values present.

Step 2: Replace NA values in a single column

We’ll replace missing bill lengths with the mean value.

penguins_clean <- penguins |>
  mutate(bill_length_mm = replace_na(bill_length_mm, 
                                   mean(bill_length_mm, na.rm = TRUE)))

This replaces all NA values in bill_length_mm with the calculated mean of the non-missing values.

Step 3: Verify the replacement

Let’s check that our replacement worked correctly.

penguins_clean |>
  filter(is.na(penguins$bill_length_mm)) |>
  select(species, bill_length_mm) |>
  head(5)

This shows the rows that originally had NA values, now filled with the mean value.

Example 2: Practical Application

The Problem

In real-world scenarios, you often need to replace NA values across multiple columns with different replacement strategies. For instance, you might want to use mean for numeric columns and “Unknown” for categorical columns in a comprehensive data cleaning process.

Step 1: Create a dataset with multiple NA types

Let’s introduce some missing values to demonstrate multiple column replacement.

messy_penguins <- penguins |>
  mutate(
    species = ifelse(row_number() %in% c(5, 15, 25), NA, species),
    body_mass_g = ifelse(row_number() %in% c(3, 13, 23), NA, body_mass_g)
  )

This creates artificial missing values in both categorical and numeric columns for demonstration.

Step 2: Replace multiple columns with different strategies

Now we’ll use replace_na() with a list to handle different columns appropriately.

clean_penguins <- messy_penguins |>
  replace_na(list(
    species = "Unknown",
    body_mass_g = 4200,
    bill_length_mm = 43.9
  ))

This replaces NA values with appropriate defaults: “Unknown” for species, and reasonable numeric values for the measurements.

Step 3: Verify multiple replacements

Let’s confirm our replacements worked across all specified columns.

clean_penguins |>
  filter(species == "Unknown" | 
         body_mass_g == 4200 | 
         bill_length_mm == 43.9) |>
  select(species, bill_length_mm, body_mass_g)

This shows all rows where our replacement values appear, confirming the function worked correctly.

Step 4: Compare before and after

Finally, let’s see the improvement in data completeness.

# Check NA counts before and after
messy_penguins |> summarise(across(everything(), ~sum(is.na(.))))
clean_penguins |> summarise(across(everything(), ~sum(is.na(.))))

This comparison reveals how many missing values were successfully replaced in each column.

Summary

replace_na() is the go-to function for replacing missing values in tidyr, working seamlessly with dplyr pipelines
Use it with a single value to replace NAs in one column, or with a named list for multiple columns simultaneously
Common replacement strategies include using means/medians for numeric data and “Unknown” or “Other” for categorical data
Always verify your replacements worked correctly by checking the modified dataset
The function integrates perfectly with modern pipe syntax |> for clean, readable data cleaning workflows

--- title: "How to use replace_na() in R" description: "Learn how to use replace_na() in R with practical examples. Step-by-step guide with code you can copy and run immediately." date: 2026-02-21 categories: ['tidyr', 'replace_na()'] format: html: code-fold: false code-tools: true --- ## Introduction The `replace_na()` function from the tidyr package is essential for handling missing values in your datasets. It allows you to replace NA values with specified replacement values across one or multiple columns. This function is particularly useful during data cleaning and preparation stages of your analysis. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage ### The Problem Let's work with the penguins dataset, which contains some missing values in the bill_length_mm column. We need to replace these NA values with a meaningful substitute like the mean of the available data. ### Step 1: Examine the data First, let's look at our dataset to identify missing values. ```r penguins |> select(species, bill_length_mm, bill_depth_mm) |> head(10) ``` This shows us the first 10 rows with some key columns, including any NA values present. ### Step 2: Replace NA values in a single column We'll replace missing bill lengths with the mean value. ```r penguins_clean <- penguins |> mutate(bill_length_mm = replace_na(bill_length_mm, mean(bill_length_mm, na.rm = TRUE))) ``` This replaces all NA values in bill_length_mm with the calculated mean of the non-missing values. ### Step 3: Verify the replacement Let's check that our replacement worked correctly. ```r penguins_clean |> filter(is.na(penguins$bill_length_mm)) |> select(species, bill_length_mm) |> head(5) ``` This shows the rows that originally had NA values, now filled with the mean value. ## Example 2: Practical Application ### The Problem In real-world scenarios, you often need to replace NA values across multiple columns with different replacement strategies. For instance, you might want to use mean for numeric columns and "Unknown" for categorical columns in a comprehensive data cleaning process. ### Step 1: Create a dataset with multiple NA types Let's introduce some missing values to demonstrate multiple column replacement. ```r messy_penguins <- penguins |> mutate( species = ifelse(row_number() %in% c(5, 15, 25), NA, species), body_mass_g = ifelse(row_number() %in% c(3, 13, 23), NA, body_mass_g) ) ``` This creates artificial missing values in both categorical and numeric columns for demonstration. ### Step 2: Replace multiple columns with different strategies Now we'll use `replace_na()` with a list to handle different columns appropriately. ```r clean_penguins <- messy_penguins |> replace_na(list( species = "Unknown", body_mass_g = 4200, bill_length_mm = 43.9 )) ``` This replaces NA values with appropriate defaults: "Unknown" for species, and reasonable numeric values for the measurements. ### Step 3: Verify multiple replacements Let's confirm our replacements worked across all specified columns. ```r clean_penguins |> filter(species == "Unknown" | body_mass_g == 4200 | bill_length_mm == 43.9) |> select(species, bill_length_mm, body_mass_g) ``` This shows all rows where our replacement values appear, confirming the function worked correctly. ### Step 4: Compare before and after Finally, let's see the improvement in data completeness. ```r # Check NA counts before and after messy_penguins |> summarise(across(everything(), ~sum(is.na(.)))) clean_penguins |> summarise(across(everything(), ~sum(is.na(.)))) ``` This comparison reveals how many missing values were successfully replaced in each column. ## Summary - `replace_na()` is the go-to function for replacing missing values in tidyr, working seamlessly with dplyr pipelines - Use it with a single value to replace NAs in one column, or with a named list for multiple columns simultaneously - Common replacement strategies include using means/medians for numeric data and "Unknown" or "Other" for categorical data - Always verify your replacements worked correctly by checking the modified dataset - The function integrates perfectly with modern pipe syntax `|>` for clean, readable data cleaning workflows --- ## Related Posts - [How to use separate() in R](/tidyr/how-to-use-separate-in-r.html) - [How to use separate_wider_delim() in R](/tidyr/how-to-use-separatewiderdelim-in-r.html) - [How to use unnest_longer() in R](/tidyr/how-to-use-unnestlonger-in-r.html) - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [How to use mutate() in R](/dplyr/how-to-use-mutate-in-r.html)

Introduction

Getting Started

Example 1: Basic Usage

The Problem

Step 1: Examine the data

Step 2: Replace NA values in a single column

Step 3: Verify the replacement

Example 2: Practical Application

The Problem

Step 1: Create a dataset with multiple NA types

Step 2: Replace multiple columns with different strategies

Step 3: Verify multiple replacements

Step 4: Compare before and after

Summary

The function integrates perfectly with modern pipe syntax |> for clean, readable data cleaning workflows

Related Posts

The function integrates perfectly with modern pipe syntax `|>` for clean, readable data cleaning workflows