How to use select() in R

dplyr
dplyr select()
Learn how to use select() in R with practical examples. Step-by-step guide with code you can copy and run immediately.
Published

February 20, 2026

Introduction

The select() function from dplyr allows you to choose specific columns from a data frame, making it essential for data cleaning and analysis. Use select() when you need to work with only certain variables or want to reorder columns in your dataset.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Column Selection

The Problem

You have a dataset with many columns but only need a few specific ones for your analysis. Let’s extract just the species and body mass columns from the penguins dataset.

Step 1: Select columns by name

The simplest way to select columns is by listing their names directly.

penguins |>
  select(species, body_mass_g) |>
  head()

This returns a new data frame containing only the species and body_mass_g columns.

Step 2: Select multiple consecutive columns

You can select a range of columns using the colon operator.

penguins |>
  select(species:flipper_length_mm) |>
  head()

This selects all columns from species through flipper_length_mm in their original order.

Step 3: Select columns by position

You can also select columns using their numeric positions.

penguins |>
  select(1, 3, 5) |>
  head()

This selects the 1st, 3rd, and 5th columns from the dataset.

Example 2: Advanced Selection Techniques

The Problem

You’re analyzing car performance data and need to exclude certain columns while keeping others. You also want to rename columns and use pattern matching to select similar variables efficiently.

Step 1: Remove unwanted columns

Use the minus sign to exclude specific columns from your selection.

mtcars |>
  select(-am, -vs, -carb) |>
  head()

This keeps all columns except am, vs, and carb, which aren’t needed for our analysis.

Step 2: Select and rename columns simultaneously

You can rename columns while selecting them using the new_name = old_name syntax.

mtcars |>
  select(miles_per_gallon = mpg, 
         horsepower = hp, 
         weight = wt) |>
  head()

This creates a cleaner dataset with more descriptive column names.

Step 3: Use helper functions for pattern matching

Select columns that contain specific text patterns using helper functions.

penguins |>
  select(species, contains("length")) |>
  head()

This selects the species column plus any columns containing “length” in their names.

Step 4: Combine selection methods

You can mix different selection approaches in a single select() call.

penguins |>
  select(species, starts_with("bill"), 
         body_mass_g, everything()) |>
  head()

This puts species first, followed by bill measurements, then body mass, and finally all remaining columns.

Summary

  • Use select() to choose specific columns from your data frame, reducing clutter and focusing on relevant variables
  • Select columns by name, position, or ranges using intuitive syntax like column1:column5
  • Remove unwanted columns with the minus operator (-column_name) instead of listing everything you want to keep
  • Leverage helper functions like contains(), starts_with(), and ends_with() for pattern-based selection
  • Combine selection with renaming to create cleaner, more readable datasets in a single step