How to use anti_join() to find non-matching rows in R

dplyr
dplyr anti_join()
Learn use anti_join() to find non-matching rows in r with clear examples and explanations.
Published

March 26, 2026

Introduction

The anti_join() function in R’s dplyr package helps you find rows that exist in one dataset but not in another. This filtering join is particularly useful for identifying missing records, finding unique observations, or cleaning data by removing unwanted matches. Use anti_join() when you want to exclude rows based on matching keys between two datasets.

Setting Up the Data

Let’s start by loading the tidyverse package and creating two sample datasets to demonstrate anti_join().

library(tidyverse)

We’ll create our first dataset containing R packages with their corresponding IDs:

set.seed(123)
id <- sample(1:5, 3, replace = FALSE)
df1 <- tibble(
  id = id,
  packages = c("dplyr", "tidyr", "tibble")
)
df1

This creates a tibble with three R packages, each assigned a random ID. The set.seed() ensures reproducible results.

Now let’s create a second dataset with some overlapping packages but different IDs:

set.seed(123)
id <- sample(1:6, 3, replace = FALSE)
df2 <- tibble(
  id = id,
  packages = c("dplyr", "ggplot2", "tidyr")
)
df2

Notice that df2 contains “dplyr” and “tidyr” (also in df1) plus “ggplot2” (unique to df2).

Finding Rows in df1 but Not in df2

To find packages that exist in df1 but not in df2, we use anti_join() with df1 as the first argument:

anti_join(df1, df2, by = "packages")

This returns rows from df1 where the package name doesn’t have a match in df2. Since “dplyr” and “tidyr” appear in both datasets, only “tibble” (unique to df1) will be returned.

Finding Rows in df2 but Not in df1

We can reverse the operation to find packages in df2 that aren’t in df1:

anti_join(df2, df1, by = "packages")

This returns “ggplot2” since it’s the only package in df2 that doesn’t appear in df1.

Understanding the Join Key

When the column names are the same (like “packages” in both datasets), anti_join() automatically uses them as the join key:

# These are equivalent
anti_join(df1, df2, by = "packages")
anti_join(df1, df2)  # automatic matching

The function automatically detects matching column names and uses them for the join operation.

Summary

The anti_join() function is a powerful tool for identifying non-matching records between datasets. Remember that the order matters: anti_join(A, B) returns rows from A that don’t match B, while anti_join(B, A) returns rows from B that don’t match A. This makes it invaluable for data cleaning, validation, and finding unique observations across related datasets.