How to use starts_with() in R

dplyr

dplyr starts_with()

Learn how to use starts_with() in R with practical examples. Step-by-step guide with code you can copy and run immediately.

Published

February 21, 2026

Introduction

The starts_with() function is a powerful helper function in dplyr that allows you to select columns based on their name patterns. It’s particularly useful when working with datasets that have systematic naming conventions, such as columns prefixed with dates, categories, or measurement types. This function streamlines data manipulation by eliminating the need to manually specify each column name.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

Imagine you have a dataset with multiple columns and want to select only those that start with specific letters or prefixes. Manually typing each column name would be tedious and error-prone.

Step 1: Examine the dataset structure

Let’s first look at the column names in the penguins dataset.

data(penguins)
colnames(penguins)
head(penguins, 3)

We can see columns like “bill_length_mm”, “bill_depth_mm”, and “body_mass_g” that follow naming patterns.

Step 2: Select columns starting with “bill”

We’ll use starts_with() to select only columns beginning with “bill”.

penguins |>
  select(starts_with("bill")) |>
  head(5)

This returns only the two bill-related columns: bill_length_mm and bill_depth_mm.

Step 3: Combine with other selection methods

You can mix starts_with() with other column selection approaches.

penguins |>
  select(species, starts_with("bill"), body_mass_g) |>
  head(4)

This selects the species column, both bill columns, and body_mass_g, giving us a focused subset of the data.

Example 2: Practical Application

The Problem

You’re analyzing car performance data and need to create a summary focusing only on efficiency-related metrics. The mtcars dataset contains various measurements, but you specifically want columns related to miles per gallon and similar efficiency measures.

Step 1: Create extended dataset with prefixed columns

Let’s add some efficiency-related columns with consistent prefixes to demonstrate the concept.

mtcars_extended <- mtcars |>
  mutate(
    efficiency_mpg = mpg,
    efficiency_ratio = mpg / wt,
    performance_hp = hp,
    performance_speed = mpg * hp / 100
  )

Now we have columns with “efficiency_” and “performance_” prefixes for systematic selection.

Step 2: Analyze efficiency metrics

Select and summarize all efficiency-related columns.

efficiency_summary <- mtcars_extended |>
  select(starts_with("efficiency")) |>
  summarise(across(everything(), 
                   list(mean = mean, sd = sd)))

This creates a summary with means and standard deviations for all efficiency columns, making analysis more systematic.

Step 3: Compare different metric categories

Create separate summaries for different measurement categories.

performance_data <- mtcars_extended |>
  select(starts_with("performance")) |>
  slice_head(n = 6)

print(performance_data)

This approach allows you to quickly isolate and analyze specific categories of measurements without manually specifying each column.

Step 4: Use in data transformation pipelines

Apply transformations to groups of similarly-named columns.

mtcars_scaled <- mtcars_extended |>
  mutate(across(starts_with("efficiency"), scale)) |>
  select(starts_with("efficiency"))

head(mtcars_scaled, 4)

The starts_with() function works seamlessly with across() to apply transformations to column groups, making data preprocessing more efficient.

Summary

starts_with() is a dplyr helper function that selects columns based on name prefixes, eliminating manual column specification
It works perfectly with select() to choose specific column groups and integrates seamlessly with the pipe operator |>
The function can be combined with other selection methods and used within across() for applying transformations to column groups
It’s particularly valuable for datasets with systematic naming conventions, making data analysis workflows more readable and maintainable
This approach reduces errors and makes code more adaptable when column names follow consistent patterns

--- title: "How to use starts_with() in R" description: "Learn how to use starts_with() in R with practical examples. Step-by-step guide with code you can copy and run immediately." date: 2026-02-21 categories: ['dplyr', 'dplyr starts_with()'] format: html: code-fold: false code-tools: true --- ## Introduction The `starts_with()` function is a powerful helper function in dplyr that allows you to select columns based on their name patterns. It's particularly useful when working with datasets that have systematic naming conventions, such as columns prefixed with dates, categories, or measurement types. This function streamlines data manipulation by eliminating the need to manually specify each column name. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage ### The Problem Imagine you have a dataset with multiple columns and want to select only those that start with specific letters or prefixes. Manually typing each column name would be tedious and error-prone. ### Step 1: Examine the dataset structure Let's first look at the column names in the penguins dataset. ```r data(penguins) colnames(penguins) head(penguins, 3) ``` We can see columns like "bill_length_mm", "bill_depth_mm", and "body_mass_g" that follow naming patterns. ### Step 2: Select columns starting with "bill" We'll use `starts_with()` to select only columns beginning with "bill". ```r penguins |> select(starts_with("bill")) |> head(5) ``` This returns only the two bill-related columns: bill_length_mm and bill_depth_mm. ### Step 3: Combine with other selection methods You can mix `starts_with()` with other column selection approaches. ```r penguins |> select(species, starts_with("bill"), body_mass_g) |> head(4) ``` This selects the species column, both bill columns, and body_mass_g, giving us a focused subset of the data. ## Example 2: Practical Application ### The Problem You're analyzing car performance data and need to create a summary focusing only on efficiency-related metrics. The mtcars dataset contains various measurements, but you specifically want columns related to miles per gallon and similar efficiency measures. ### Step 1: Create extended dataset with prefixed columns Let's add some efficiency-related columns with consistent prefixes to demonstrate the concept. ```r mtcars_extended <- mtcars |> mutate( efficiency_mpg = mpg, efficiency_ratio = mpg / wt, performance_hp = hp, performance_speed = mpg * hp / 100 ) ``` Now we have columns with "efficiency_" and "performance_" prefixes for systematic selection. ### Step 2: Analyze efficiency metrics Select and summarize all efficiency-related columns. ```r efficiency_summary <- mtcars_extended |> select(starts_with("efficiency")) |> summarise(across(everything(), list(mean = mean, sd = sd))) ``` This creates a summary with means and standard deviations for all efficiency columns, making analysis more systematic. ### Step 3: Compare different metric categories Create separate summaries for different measurement categories. ```r performance_data <- mtcars_extended |> select(starts_with("performance")) |> slice_head(n = 6) print(performance_data) ``` This approach allows you to quickly isolate and analyze specific categories of measurements without manually specifying each column. ### Step 4: Use in data transformation pipelines Apply transformations to groups of similarly-named columns. ```r mtcars_scaled <- mtcars_extended |> mutate(across(starts_with("efficiency"), scale)) |> select(starts_with("efficiency")) head(mtcars_scaled, 4) ``` The `starts_with()` function works seamlessly with [`across()`](/dplyr/how-to-use-across-in-r.html) to apply transformations to column groups, making data preprocessing more efficient. ## Summary - `starts_with()` is a dplyr helper function that selects columns based on name prefixes, eliminating manual column specification - It works perfectly with [`select()`](/dplyr/how-to-use-select-in-r.html) to choose specific column groups and integrates seamlessly with the pipe operator `|>` - The function can be combined with other selection methods and used within `across()` for applying transformations to column groups - It's particularly valuable for datasets with systematic naming conventions, making data analysis workflows more readable and maintainable - This approach reduces errors and makes code more adaptable when column names follow consistent patterns --- ## Related Posts - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [How to use mutate() in R](/dplyr/how-to-use-mutate-in-r.html) - [How to use pull() in R](/dplyr/how-to-use-pull-in-r.html) - [How to use separate() in R](/tidyr/how-to-use-separate-in-r.html) - [How to use separate_wider_delim() in R](/tidyr/how-to-use-separatewiderdelim-in-r.html)