How to use filter() in R

dplyr
dplyr filter()
Learn how to use filter() in R with practical examples. Step-by-step guide with code you can copy and run immediately.
Published

February 20, 2026

Introduction

The filter() function from the dplyr package allows you to subset rows from data frames based on specific conditions. It’s essential for data analysis when you need to focus on particular observations that meet your criteria. Use filter() whenever you want to narrow down your dataset to rows that satisfy logical conditions.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We want to explore the penguins dataset by selecting only specific rows that meet certain criteria. Let’s start with simple filtering operations to understand how filter() works.

Step 1: Filter by a single condition

First, let’s filter penguins to show only those from the Adelie species.

# Filter for Adelie penguins only
adelie_penguins <- penguins |>
  filter(species == "Adelie")

head(adelie_penguins)

This creates a new dataset containing only the 152 Adelie penguins from the original 344 observations.

Step 2: Filter with numeric conditions

Now let’s filter penguins with body mass greater than 4000 grams.

# Filter for heavy penguins
heavy_penguins <- penguins |>
  filter(body_mass_g > 4000)

nrow(heavy_penguins)

This returns 104 penguins that weigh more than 4000 grams, showing how numeric comparisons work in filter().

Step 3: Filter with multiple conditions

Let’s combine conditions using the AND operator to find large male penguins.

# Filter for large male penguins
large_males <- penguins |>
  filter(sex == "male" & body_mass_g > 4500)

head(large_males)

This demonstrates how to use multiple conditions simultaneously, returning only male penguins weighing over 4500 grams.

Example 2: Practical Application

The Problem

Imagine you’re a researcher studying penguin populations across different islands and years. You need to analyze specific subsets of data to understand patterns in bill dimensions and body mass. This requires more complex filtering operations that combine multiple criteria.

Step 1: Filter by multiple categories

Let’s find penguins from specific islands and species combinations.

# Filter for Gentoo penguins from Biscoe island
gentoo_biscoe <- penguins |>
  filter(species == "Gentoo" & island == "Biscoe")

summary(gentoo_biscoe$bill_length_mm)

This gives us 119 Gentoo penguins specifically from Biscoe island, allowing focused analysis of this population.

Step 2: Filter using the OR operator

Now let’s find penguins that are either very light or very heavy.

# Filter for extreme weights
extreme_weights <- penguins |>
  filter(body_mass_g < 3000 | body_mass_g > 5500) |>
  select(species, body_mass_g, sex)

extreme_weights

This identifies penguins at the extremes of the weight distribution, useful for studying outliers or exceptional cases.

Step 3: Filter with the %in% operator

Let’s filter for penguins from multiple islands at once.

# Filter for penguins from Dream or Torgersen islands
dream_torgersen <- penguins |>
  filter(island %in% c("Dream", "Torgersen"))

table(dream_torgersen$island, dream_torgersen$species)

This creates a contingency table showing species distribution across the two selected islands, demonstrating efficient multi-value filtering.

Step 4: Filter and remove missing values

Finally, let’s create a clean dataset for analysis by filtering out missing values.

# Filter complete cases for bill measurements
complete_bills <- penguins |>
  filter(!is.na(bill_length_mm) & !is.na(bill_depth_mm))

nrow(complete_bills)

This removes rows with missing bill measurements, giving us 342 complete observations ready for analysis.

Summary

  • Use filter() to subset rows based on logical conditions with comparison operators (==, >, <, >=, <=)
  • Combine multiple conditions with AND (&) to make filtering more restrictive
  • Use OR (|) to include rows meeting any of several conditions
  • The %in% operator efficiently filters for multiple values in a single column
  • Always handle missing values appropriately using is.na() or !is.na() in your filter conditions