How to use where() in R

dplyr

dplyr where()

Learn how to use where() in R with practical examples. Step-by-step guide with code you can copy and run immediately.

Published

February 21, 2026

Introduction

The where() function in dplyr allows you to select columns based on their data type or other column properties. Instead of selecting columns by name, you can select all numeric columns, all character columns, or columns that meet specific criteria. This is particularly useful when working with wide datasets where manually specifying column names would be tedious.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

You have a dataset with mixed data types and want to perform operations only on specific types of columns. Let’s say you want to find the mean of all numeric columns in the penguins dataset.

Step 1: Examine the dataset structure

First, let’s look at what types of columns we’re working with.

penguins |>
  glimpse()

This shows us the data types for each column - we can see numeric columns like bill_length_mm and character columns like species.

Step 2: Select all numeric columns

Now we’ll use where() to select only numeric columns.

penguins |>
  select(where(is.numeric)) |>
  head()

The where(is.numeric) function identifies and selects all columns that contain numeric data, excluding factors and character columns.

Step 3: Calculate summary statistics

We can now easily calculate means for all numeric columns at once.

penguins |>
  select(where(is.numeric)) |>
  summarise(across(everything(), ~ mean(.x, na.rm = TRUE)))

This gives us the mean values for all numeric columns in one operation, automatically handling missing values.

Example 2: Practical Application

The Problem

You’re analyzing the mtcars dataset and need to standardize all numeric variables while keeping categorical variables unchanged. You also want to identify which columns have missing values across different data types.

Step 1: Standardize numeric columns

Let’s standardize (z-score) all numeric columns in mtcars.

mtcars_standardized <- mtcars |>
  mutate(across(where(is.numeric), ~ scale(.x)[,1]))

head(mtcars_standardized)

The where(is.numeric) ensures we only apply the scaling function to numeric columns, leaving any categorical data untouched.

Step 2: Check for missing values by data type

Now let’s see how many missing values exist in different types of columns.

penguins |>
  summarise(
    numeric_na = sum(is.na(select(cur_data(), where(is.numeric)))),
    character_na = sum(is.na(select(cur_data(), where(is.character))))
  )

This approach lets us quickly assess data quality across different column types without manually specifying each column.

Step 3: Apply different transformations by type

We can apply type-specific operations in a single pipeline.

penguins |>
  mutate(
    across(where(is.numeric), ~ round(.x, 1)),
    across(where(is.character), ~ str_to_upper(.x))
  ) |>
  head()

This simultaneously rounds numeric columns to one decimal place and converts character columns to uppercase.

Step 4: Filter columns by custom criteria

You can also use custom functions with where() to select columns based on specific conditions.

penguins |>
  select(where(~ n_distinct(.x, na.rm = TRUE) > 10)) |>
  head()

This selects only columns that have more than 10 distinct values, useful for identifying continuous variables or high-cardinality categorical variables.

Summary

where() selects columns based on data type or properties rather than column names
Use where(is.numeric), where(is.character), or where(is.factor) for common data types
Combine where() with across() to apply functions to columns of specific types
Create custom selection criteria by passing any function that returns TRUE/FALSE for each column
Perfect for wide datasets where manual column specification would be impractical

--- title: "How to use where() in R" description: "Learn how to use where() in R with practical examples. Step-by-step guide with code you can copy and run immediately." date: 2026-02-21 categories: ['dplyr', 'dplyr where()'] format: html: code-fold: false code-tools: true --- ## Introduction The `where()` function in dplyr allows you to select columns based on their data type or other column properties. Instead of selecting columns by name, you can select all numeric columns, all character columns, or columns that meet specific criteria. This is particularly useful when working with wide datasets where manually specifying column names would be tedious. ## Getting Started ```r library(tidyverse) library(palmerpenguins) ``` ## Example 1: Basic Usage ### The Problem You have a dataset with mixed data types and want to perform operations only on specific types of columns. Let's say you want to find the mean of all numeric columns in the penguins dataset. ### Step 1: Examine the dataset structure First, let's look at what types of columns we're working with. ```r penguins |> glimpse() ``` This shows us the data types for each column - we can see numeric columns like bill_length_mm and character columns like species. ### Step 2: Select all numeric columns Now we'll use `where()` to select only numeric columns. ```r penguins |> select(where(is.numeric)) |> head() ``` The `where(is.numeric)` function identifies and selects all columns that contain numeric data, excluding factors and character columns. ### Step 3: Calculate summary statistics We can now easily calculate means for all numeric columns at once. ```r penguins |> select(where(is.numeric)) |> summarise(across(everything(), ~ mean(.x, na.rm = TRUE))) ``` This gives us the mean values for all numeric columns in one operation, automatically handling missing values. ## Example 2: Practical Application ### The Problem You're analyzing the mtcars dataset and need to standardize all numeric variables while keeping categorical variables unchanged. You also want to identify which columns have missing values across different data types. ### Step 1: Standardize numeric columns Let's standardize (z-score) all numeric columns in mtcars. ```r mtcars_standardized <- mtcars |> mutate(across(where(is.numeric), ~ scale(.x)[,1])) head(mtcars_standardized) ``` The `where(is.numeric)` ensures we only apply the scaling function to numeric columns, leaving any categorical data untouched. ### Step 2: Check for missing values by data type Now let's see how many missing values exist in different types of columns. ```r penguins |> summarise( numeric_na = sum(is.na(select(cur_data(), where(is.numeric)))), character_na = sum(is.na(select(cur_data(), where(is.character)))) ) ``` This approach lets us quickly assess data quality across different column types without manually specifying each column. ### Step 3: Apply different transformations by type We can apply type-specific operations in a single pipeline. ```r penguins |> mutate( across(where(is.numeric), ~ round(.x, 1)), across(where(is.character), ~ str_to_upper(.x)) ) |> head() ``` This simultaneously rounds numeric columns to one decimal place and converts character columns to uppercase. ### Step 4: Filter columns by custom criteria You can also use custom functions with `where()` to select columns based on specific conditions. ```r penguins |> select(where(~ n_distinct(.x, na.rm = TRUE) > 10)) |> head() ``` This selects only columns that have more than 10 distinct values, useful for identifying continuous variables or high-cardinality categorical variables. ## Summary - `where()` selects columns based on data type or properties rather than column names - Use `where(is.numeric)`, `where(is.character)`, or `where(is.factor)` for common data types - Combine `where()` with [`across()`](/dplyr/how-to-use-across-in-r.html) to apply functions to columns of specific types - Create custom selection criteria by passing any function that returns TRUE/FALSE for each column - Perfect for wide datasets where manual column specification would be impractical --- ## Related Posts - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [How to use mutate() in R](/dplyr/how-to-use-mutate-in-r.html) - [How to use pull() in R](/dplyr/how-to-use-pull-in-r.html) - [How to use separate() in R](/tidyr/how-to-use-separate-in-r.html) - [How to use separate_wider_delim() in R](/tidyr/how-to-use-separatewiderdelim-in-r.html)