How to use pick() to select columns dynamically in R

dplyr

dplyr pick()

Learn use pick() to select columns dynamically in r with clear examples and explanations.

Published

March 26, 2026

Introduction

The pick() function in dplyr is a powerful helper that allows you to select columns programmatically within data manipulation functions like mutate(), summarise(), and across(). It’s particularly useful when you need to apply operations to columns based on conditions (like data type) or naming patterns, making your code more flexible and maintainable.

Setup

Let’s start by loading the necessary packages and checking our dplyr version to ensure pick() is available:

library(tidyverse)
library(palmerpenguins)
packageVersion("dplyr")

The pick() function was introduced in dplyr 1.1.0, so make sure you have an up-to-date version installed.

Basic Usage with Numeric Operations

One of the most common uses of pick() is to perform operations on columns of a specific type. Here’s how to calculate row sums across all numeric columns:

df <- tibble(x = 1:2, y = 3:4, z = 5:6)
df |>
  mutate(total = rowSums(pick(is.numeric)))

This creates a new column total that contains the sum of all numeric columns for each row. The pick(is.numeric) selects only numeric columns, making the code robust even if non-numeric columns are added later.

Combining pick() with across()

You can combine pick() with across() to apply functions to selected columns. Here’s how to standardize all numeric columns:

df |>
  mutate(across(pick(is.numeric), ~ (. - mean(.)) / sd(.)))

This standardizes each numeric column by subtracting its mean and dividing by its standard deviation, creating z-scores for all values.

Selecting Specific Columns

pick() isn’t limited to type-based selection. You can also specify exact column names:

df <- tibble(
  x = c(3, 2, 2, 2, 1),
  y = c(0, 2, 1, 1, 4),
  z1 = c("a", "a", "a", "b", "a"),
  z2 = c("c", "d", "d", "a", "c")
)

df |> mutate(coords = pick(x, y))

This creates a new column containing a tibble with just the x and y coordinates for each row.

Working with Real Data

Let’s see how pick() works with the Palmer penguins dataset. First, selecting character columns:

penguins |> mutate(text_data = pick(species, sex))

This creates a nested tibble containing just the character columns for each penguin observation.

Type-Based Column Selection

You can select all columns of a specific type and examine the structure:

penguins |>
  mutate(pick(is.character)) |>
  glimpse()

This shows how pick(is.character) maintains all character columns in the dataset while allowing you to see the overall structure.

Summarizing with pick()

pick() is also useful in summarise() operations. Here’s how to calculate means for all numeric columns:

df <- tibble(
  id = 1:5,
  var1 = rnorm(5),
  var2 = rnorm(5),
  category = letters[1:5]
)

df |>
  summarise(across(pick(is.numeric), mean))

This automatically calculates the mean for every numeric column, including the id column.

Text Transformations

You can apply text transformations to character columns using pick():

penguins |>
  mutate(across(pick(is.character), toupper))

This converts all text in character columns to uppercase, demonstrating how pick() makes it easy to apply transformations across multiple columns of the same type.

Standardizing Numeric Data

For data analysis, you often need to standardize numeric variables. Here’s how to do it for all numeric columns at once:

penguins |>
  mutate(across(pick(is.numeric), ~ (. - mean(., na.rm = TRUE)) / sd(., na.rm = TRUE)))

This standardizes all numeric columns in the penguins dataset, handling missing values appropriately.

Using pick() Standalone

pick() can also be used outside of other functions to extract specific columns:

penguins |>
  pick(species, sex)

This returns a tibble with only the species and sex columns, similar to select() but with different syntax that’s consistent with its use in other contexts.

Pattern-Based Selection

pick() works with tidyselect helpers for pattern-based column selection:

penguins |> 
  pick(starts_with("s"))

This selects all columns whose names start with “s”, which in the penguins dataset includes species and sex.

Advanced Operations

You can use pick() in more complex operations like ranking based on multiple columns:

penguins |>
  mutate(rank = dense_rank(pick(ends_with("mm")))) |>
  arrange(rank)

This creates rankings based on all measurement columns (those ending in “mm”) and sorts the data accordingly.

Grouping Operations

pick() also works well with grouping functions like count():

penguins |> 
  count(pick(starts_with("s")))

This counts the combinations of all columns starting with “s”, providing a frequency table for species and sex combinations.

Summary

The pick() function is a versatile tool that makes column selection more programmatic and flexible. It’s particularly powerful when combined with type predicates like is.numeric or is.character, allowing you to write code that adapts automatically to your data structure. Whether you’re calculating summary statistics, applying transformations, or creating new variables, pick() helps make your dplyr code more concise and maintainable.

--- title: "How to use pick() to select columns dynamically in R" description: "Learn use pick() to select columns dynamically in r with clear examples and explanations." date: 2026-03-26 categories: ['dplyr', 'dplyr pick()'] format: html: code-fold: false code-tools: true --- ## Introduction The `pick()` function in dplyr is a powerful helper that allows you to select columns programmatically within data manipulation functions like `mutate()`, `summarise()`, and `across()`. It's particularly useful when you need to apply operations to columns based on conditions (like data type) or naming patterns, making your code more flexible and maintainable. ## Setup Let's start by loading the necessary packages and checking our dplyr version to ensure `pick()` is available: ```r library(tidyverse) library(palmerpenguins) packageVersion("dplyr") ``` The `pick()` function was introduced in dplyr 1.1.0, so make sure you have an up-to-date version installed. ## Basic Usage with Numeric Operations One of the most common uses of `pick()` is to perform operations on columns of a specific type. Here's how to calculate row sums across all numeric columns: ```r df <- tibble(x = 1:2, y = 3:4, z = 5:6) df |> mutate(total = rowSums(pick(is.numeric))) ``` This creates a new column `total` that contains the sum of all numeric columns for each row. The `pick(is.numeric)` selects only numeric columns, making the code robust even if non-numeric columns are added later. ## Combining pick() with across() You can combine `pick()` with `across()` to apply functions to selected columns. Here's how to standardize all numeric columns: ```r df |> mutate(across(pick(is.numeric), ~ (. - mean(.)) / sd(.))) ``` This standardizes each numeric column by subtracting its mean and dividing by its standard deviation, creating z-scores for all values. ## Selecting Specific Columns `pick()` isn't limited to type-based selection. You can also specify exact column names: ```r df <- tibble( x = c(3, 2, 2, 2, 1), y = c(0, 2, 1, 1, 4), z1 = c("a", "a", "a", "b", "a"), z2 = c("c", "d", "d", "a", "c") ) df |> mutate(coords = pick(x, y)) ``` This creates a new column containing a tibble with just the x and y coordinates for each row. ## Working with Real Data Let's see how `pick()` works with the Palmer penguins dataset. First, selecting character columns: ```r penguins |> mutate(text_data = pick(species, sex)) ``` This creates a nested tibble containing just the character columns for each penguin observation. ## Type-Based Column Selection You can select all columns of a specific type and examine the structure: ```r penguins |> mutate(pick(is.character)) |> glimpse() ``` This shows how `pick(is.character)` maintains all character columns in the dataset while allowing you to see the overall structure. ## Summarizing with pick() `pick()` is also useful in `summarise()` operations. Here's how to calculate means for all numeric columns: ```r df <- tibble( id = 1:5, var1 = rnorm(5), var2 = rnorm(5), category = letters[1:5] ) df |> summarise(across(pick(is.numeric), mean)) ``` This automatically calculates the mean for every numeric column, including the id column. ## Text Transformations You can apply text transformations to character columns using `pick()`: ```r penguins |> mutate(across(pick(is.character), toupper)) ``` This converts all text in character columns to uppercase, demonstrating how `pick()` makes it easy to apply transformations across multiple columns of the same type. ## Standardizing Numeric Data For data analysis, you often need to standardize numeric variables. Here's how to do it for all numeric columns at once: ```r penguins |> mutate(across(pick(is.numeric), ~ (. - mean(., na.rm = TRUE)) / sd(., na.rm = TRUE))) ``` This standardizes all numeric columns in the penguins dataset, handling missing values appropriately. ## Using pick() Standalone `pick()` can also be used outside of other functions to extract specific columns: ```r penguins |> pick(species, sex) ``` This returns a tibble with only the species and sex columns, similar to `select()` but with different syntax that's consistent with its use in other contexts. ## Pattern-Based Selection `pick()` works with tidyselect helpers for pattern-based column selection: ```r penguins |> pick(starts_with("s")) ``` This selects all columns whose names start with "s", which in the penguins dataset includes species and sex. ## Advanced Operations You can use `pick()` in more complex operations like ranking based on multiple columns: ```r penguins |> mutate(rank = dense_rank(pick(ends_with("mm")))) |> arrange(rank) ``` This creates rankings based on all measurement columns (those ending in "mm") and sorts the data accordingly. ## Grouping Operations `pick()` also works well with grouping functions like `count()`: ```r penguins |> count(pick(starts_with("s"))) ``` This counts the combinations of all columns starting with "s", providing a frequency table for species and sex combinations. ## Summary The `pick()` function is a versatile tool that makes column selection more programmatic and flexible. It's particularly powerful when combined with type predicates like `is.numeric` or `is.character`, allowing you to write code that adapts automatically to your data structure. Whether you're calculating summary statistics, applying transformations, or creating new variables, `pick()` helps make your dplyr code more concise and maintainable.