How to remove rows with all NAs

dplyr if_any
Learn how to remove rows with all nas with this comprehensive R tutorial. Includes practical examples and code snippets.
Published

August 29, 2023

Introduction

Removing rows where all columns contain NA values is a common data cleaning task in R. This operation helps eliminate completely empty observations that provide no useful information for analysis. The dplyr package provides several efficient methods to accomplish this task.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We need to identify and remove rows where every column contains NA values. These completely empty rows often result from data import issues or merging operations gone wrong.

Step 1: Create sample data with all-NA rows

We’ll start by creating a dataset that includes some rows with all NA values to demonstrate the removal process.

# Create sample data with all-NA rows
sample_data <- tibble(
  name = c("Alice", NA, "Bob", NA, "Charlie"),
  age = c(25, NA, 30, NA, 35),
  score = c(85, NA, 92, NA, 78)
)
print(sample_data)

This creates a dataset where rows 2 and 4 contain all NA values across every column.

Step 2: Remove rows with all NAs using filter()

The most straightforward approach uses filter() with if_any() to keep rows that have at least one non-NA value.

# Remove rows where all values are NA
clean_data <- sample_data |>
  filter(if_any(everything(), ~ !is.na(.)))

print(clean_data)

This filter keeps only rows where at least one column contains a non-NA value, effectively removing all-NA rows.

Step 3: Alternative approach with rowSums()

Another method uses rowSums() to count non-NA values per row and filters based on that count.

# Alternative method using rowSums
clean_data_alt <- sample_data |>
  filter(rowSums(!is.na(across(everything()))) > 0)

print(clean_data_alt)

This approach counts non-NA values in each row and keeps rows with at least one non-NA value.

Example 2: Practical Application

The Problem

Let’s work with a real scenario using the penguins dataset. We’ll artificially introduce some all-NA rows to simulate a messy dataset that needs cleaning. This represents a common situation when combining multiple data sources or dealing with incomplete data collection.

Step 1: Create messy penguin data

We’ll add some completely empty rows to the penguins dataset to simulate real-world data issues.

# Create messy data with all-NA rows
messy_penguins <- penguins |>
  slice(1:6) |>
  add_row(.after = 2) |>
  add_row(.after = 5)

print(messy_penguins)

Now we have a dataset with empty rows mixed in with valid penguin observations.

Step 2: Remove all-NA rows from specific columns

Sometimes you only want to check for all-NA values in specific columns rather than the entire row.

# Remove rows with all NAs in specific columns
clean_penguins <- messy_penguins |>
  filter(if_any(c(bill_length_mm, bill_depth_mm, 
                  flipper_length_mm, body_mass_g), 
                ~ !is.na(.)))

print(clean_penguins)

This removes rows where all the specified measurement columns are NA, but keeps rows with valid measurements even if other columns are missing.

Step 3: Count removed rows

It’s helpful to track how many rows were removed during the cleaning process for documentation purposes.

# Count original and cleaned rows
original_count <- nrow(messy_penguins)
clean_count <- nrow(clean_penguins)
removed_count <- original_count - clean_count

cat("Original rows:", original_count, 
    "\nCleaned rows:", clean_count,
    "\nRows removed:", removed_count)

This provides a summary of the cleaning operation, showing exactly how many all-NA rows were eliminated.

Step 4: Verify the cleaning

Let’s confirm our cleaning worked by checking for any remaining all-NA rows in the cleaned dataset.

# Verify no all-NA rows remain
verification <- clean_penguins |>
  mutate(all_na = rowSums(is.na(across(everything()))) == ncol(.)) |>
  summarise(remaining_all_na_rows = sum(all_na))

print(verification)

This verification step ensures our cleaning process successfully removed all rows containing only NA values.

Summary

  • Use filter(if_any(everything(), ~ !is.na(.))) for the most readable approach to remove all-NA rows
  • The rowSums(!is.na(across(everything()))) > 0 method offers an alternative counting-based approach
  • Target specific columns with if_any(c(col1, col2), ~ !is.na(.)) when you only care about certain variables
  • Always verify your cleaning results and document how many rows were removed
  • These techniques work efficiently with both small samples and large datasets using dplyr’s optimized functions