dplyr ends_with(): select columns that end with a suffix

dplyr ends_with()
endsWith()
Master dplyr ends_with() to select columns that end with a suffix. Complete R tutorial with examples using real datasets.
Published

July 6, 2022

Introduction

The ends_with() function in dplyr is a powerful selection helper that allows you to choose columns based on their suffix. This function is particularly useful when working with datasets that follow naming conventions where related variables share common endings, such as measurements in different units, time periods, or categories.

You’ll find ends_with() invaluable when dealing with survey data (questions ending in “_score”), longitudinal studies (variables ending in “_2023”, “_2024”), or scientific measurements (variables ending in “_mm”, “_cm”). Instead of manually typing each column name, ends_with() provides a clean, efficient way to select multiple related columns at once, making your code more readable and maintainable.

Getting Started

First, let’s load the required packages for this tutorial:

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

Let’s start with a simple example using the penguins dataset. We’ll select all columns that end with “_mm” to get the physical measurements:

# View the column names to see what we're working with
colnames(penguins)

# Select columns ending with "_mm"
penguin_measurements <- penguins |>
  select(ends_with("_mm"))

# Check the selected columns
colnames(penguin_measurements)

This selects bill_length_mm, bill_depth_mm, and flipper_length_mm. You can also combine ends_with() with other selection helpers:

# Select species and all measurement columns
penguins |>
  select(species, ends_with("_mm"))

# Select multiple suffixes
penguins |>
  select(ends_with(c("_mm", "_g")))

Example 2: Practical Application

Let’s create a more complex example by analyzing penguin body measurements and calculating summary statistics. We’ll use ends_with() to efficiently work with measurement columns:

# Calculate mean measurements by species for all "_mm" columns
penguin_summary <- penguins |>
  group_by(species) |>
  summarise(
    across(ends_with("_mm"), ~ mean(.x, na.rm = TRUE)),
    .groups = "drop"
  ) |>
  # Round to one decimal place
  mutate(across(ends_with("_mm"), ~ round(.x, 1)))

print(penguin_summary)

Here’s another practical example where we standardize (z-score) all measurement columns:

# Standardize all measurement columns
penguins_standardized <- penguins |>
  mutate(
    across(ends_with("_mm"), ~ scale(.x)[,1])
  ) |>
  select(species, island, ends_with("_mm"))

# View the first few rows
head(penguins_standardized)

We can also use ends_with() for data quality checks, such as finding missing values across measurement columns:

# Count missing values in measurement columns
missing_summary <- penguins |>
  summarise(
    across(ends_with("_mm"), ~ sum(is.na(.x)))
  )

print(missing_summary)

For a more advanced application, let’s create a correlation matrix for all measurement variables:

# Create correlation matrix for measurement columns
correlation_matrix <- penguins |>
  select(ends_with("_mm")) |>
  cor(use = "complete.obs") |>
  round(3)

print(correlation_matrix)

Summary

The ends_with() function is an essential tool for efficient column selection in dplyr. Key takeaways include:

  • Pattern matching: ends_with() selects columns based on suffix patterns, perfect for consistently named variables
  • Flexible usage: Works seamlessly with other dplyr functions like select(), mutate(), across(), and summarise()
  • Multiple patterns: You can specify multiple suffixes using a character vector
  • Clean code: Reduces repetitive column naming and makes your code more maintainable

Whether you’re conducting exploratory data analysis, creating summary statistics, or performing data transformations, ends_with() helps you write more efficient and readable code by leveraging consistent naming patterns in your datasets.