How to calculate z-scores using tidyverse in R

statistics

z-score tidyverse

Learn calculate z-scores using tidyverse in r with clear examples and explanations.

Published

March 26, 2026

Introduction

The pick() function in dplyr is a powerful selection helper that allows you to choose columns dynamically within data manipulation functions like mutate(), summarise(), and other dplyr verbs. Unlike traditional column selection, pick() returns the actual data from selected columns, making it perfect for operations that need to work with multiple columns simultaneously, such as row-wise calculations or applying functions across selected columns.

Setup

Let’s start by loading the required packages and exploring our dataset:

library(tidyverse)
library(palmerpenguins)

penguins |> head()

The Palmer penguins dataset contains measurements for three penguin species with both numeric and character variables, making it perfect for demonstrating pick() functionality.

Basic Column Selection with pick()

The simplest use of pick() is selecting specific columns by name. This returns the actual data from those columns:

penguins |> 
  pick(species, sex)

You can also use tidyselect helpers like starts_with() to select columns based on naming patterns:

penguins |> 
  pick(starts_with("s"))

This selects all columns whose names start with “s” (species and sex in our case).

Using pick() for Row-wise Calculations

One of the most powerful uses of pick() is performing calculations across multiple columns in each row. Here’s how to sum all numeric columns:

df <- tibble(x = 1:2, y = 3:4, z = 5:6)
df |>
  mutate(total = rowSums(pick(is.numeric)))

The pick(is.numeric) selects all numeric columns, and rowSums() calculates the sum for each row. This is much cleaner than manually specifying each column name.

Combining pick() with across()

You can combine pick() with across() to apply functions to selected columns. Here’s how to standardize all numeric columns:

df |>
  mutate(across(pick(is.numeric), ~ (. - mean(.)) / sd(.)))

This approach first uses pick() to select numeric columns, then applies the standardization function to each selected column.

Working with Character Columns

pick() works equally well with character columns. You can select and transform character data:

penguins |>
  mutate(across(pick(is.character), toupper))

This converts all character columns to uppercase. The pick(is.character) dynamically identifies character columns without you needing to name them explicitly.

Advanced Selection Patterns

You can use various tidyselect helpers with pick() for sophisticated column selection:

penguins |> 
  pick(ends_with("g"))

This selects columns ending with “g” (like “bill_length_mm” variables). You can even use pick() within ranking functions:

penguins |>
  mutate(rank = dense_rank(pick(ends_with("g")))) |>
  arrange(rank)

Using pick() with count()

pick() is also useful for counting combinations of variables:

penguins |> 
  count(pick(starts_with("s")))

This counts unique combinations of all columns starting with “s”, providing a frequency table of species-sex combinations.

Complex Data Transformations

For more complex scenarios, you can store selected columns as nested data:

df <- tibble(
  x = c(3, 2, 2, 2, 1),
  y = c(0, 2, 1, 1, 4),
  z1 = c("a", "a", "a", "b", "a"),
  z2 = c("c", "d", "d", "a", "c")
)

df |> mutate(cols = pick(x, y))

This creates a new column containing the selected data as nested tibbles, useful for complex analytical workflows.

Summarizing with pick()

pick() works seamlessly with summarise() for aggregate operations:

df <- tibble(
  id = 1:5,
  var1 = rnorm(5),
  var2 = rnorm(5),
  category = letters[1:5]
)

df |>
  summarise(across(pick(is.numeric), mean))

This calculates the mean of all numeric columns, automatically excluding non-numeric variables.

Summary

The pick() function revolutionizes column selection in dplyr by returning actual data rather than just column references. It’s particularly powerful when combined with predicate functions like is.numeric() or tidyselect helpers like starts_with(), enabling dynamic and flexible data manipulation. Whether you’re performing row-wise calculations, applying transformations across multiple columns, or creating complex summaries, pick() makes your code more concise and maintainable by eliminating the need to manually specify column names.

--- title: "How to calculate z-scores using tidyverse in R" description: "Learn calculate z-scores using tidyverse in r with clear examples and explanations." date: 2026-03-26 categories: ['statistics', 'z-score tidyverse'] format: html: code-fold: false code-tools: true --- ## Introduction The `pick()` function in dplyr is a powerful selection helper that allows you to choose columns dynamically within data manipulation functions like `mutate()`, `summarise()`, and other dplyr verbs. Unlike traditional column selection, `pick()` returns the actual data from selected columns, making it perfect for operations that need to work with multiple columns simultaneously, such as row-wise calculations or applying functions across selected columns. ## Setup Let's start by loading the required packages and exploring our dataset: ```r library(tidyverse) library(palmerpenguins) ``` ```r penguins |> head() ``` The Palmer penguins dataset contains measurements for three penguin species with both numeric and character variables, making it perfect for demonstrating `pick()` functionality. ## Basic Column Selection with pick() The simplest use of `pick()` is selecting specific columns by name. This returns the actual data from those columns: ```r penguins |> pick(species, sex) ``` You can also use tidyselect helpers like `starts_with()` to select columns based on naming patterns: ```r penguins |> pick(starts_with("s")) ``` This selects all columns whose names start with "s" (species and sex in our case). ## Using pick() for Row-wise Calculations One of the most powerful uses of `pick()` is performing calculations across multiple columns in each row. Here's how to sum all numeric columns: ```r df <- tibble(x = 1:2, y = 3:4, z = 5:6) df |> mutate(total = rowSums(pick(is.numeric))) ``` The `pick(is.numeric)` selects all numeric columns, and `rowSums()` calculates the sum for each row. This is much cleaner than manually specifying each column name. ## Combining pick() with across() You can combine `pick()` with `across()` to apply functions to selected columns. Here's how to standardize all numeric columns: ```r df |> mutate(across(pick(is.numeric), ~ (. - mean(.)) / sd(.))) ``` This approach first uses `pick()` to select numeric columns, then applies the standardization function to each selected column. ## Working with Character Columns `pick()` works equally well with character columns. You can select and transform character data: ```r penguins |> mutate(across(pick(is.character), toupper)) ``` This converts all character columns to uppercase. The `pick(is.character)` dynamically identifies character columns without you needing to name them explicitly. ## Advanced Selection Patterns You can use various tidyselect helpers with `pick()` for sophisticated column selection: ```r penguins |> pick(ends_with("g")) ``` This selects columns ending with "g" (like "bill_length_mm" variables). You can even use `pick()` within ranking functions: ```r penguins |> mutate(rank = dense_rank(pick(ends_with("g")))) |> arrange(rank) ``` ## Using pick() with count() `pick()` is also useful for counting combinations of variables: ```r penguins |> count(pick(starts_with("s"))) ``` This counts unique combinations of all columns starting with "s", providing a frequency table of species-sex combinations. ## Complex Data Transformations For more complex scenarios, you can store selected columns as nested data: ```r df <- tibble( x = c(3, 2, 2, 2, 1), y = c(0, 2, 1, 1, 4), z1 = c("a", "a", "a", "b", "a"), z2 = c("c", "d", "d", "a", "c") ) df |> mutate(cols = pick(x, y)) ``` This creates a new column containing the selected data as nested tibbles, useful for complex analytical workflows. ## Summarizing with pick() `pick()` works seamlessly with `summarise()` for aggregate operations: ```r df <- tibble( id = 1:5, var1 = rnorm(5), var2 = rnorm(5), category = letters[1:5] ) df |> summarise(across(pick(is.numeric), mean)) ``` This calculates the mean of all numeric columns, automatically excluding non-numeric variables. ## Summary The `pick()` function revolutionizes column selection in dplyr by returning actual data rather than just column references. It's particularly powerful when combined with predicate functions like `is.numeric()` or tidyselect helpers like `starts_with()`, enabling dynamic and flexible data manipulation. Whether you're performing row-wise calculations, applying transformations across multiple columns, or creating complex summaries, `pick()` makes your code more concise and maintainable by eliminating the need to manually specify column names.