tidyverse all_of(): select columns from a vector

tidyselect

Published

November 2, 2022

In this tutorial, we will learn about how to select multiple columns from a dataframe by using the column names as a vector at once.

tidyverse’ tidyselect package has numerous options for selecting columns from a datafame. all_of() is one of the functions in tidyselect that helps us selecting multiple columns using a character vector.

Let us see an example of why we should use all_of() to select columns from a vector. First we will load tidyverse the meta R package.

library(tidyverse)

starwars %>% head()

# A tibble: 6 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
                                   
1 Luke Sky…    172    77 blond      fair       blue            19   male  mascu…
2 C-3PO        167    75        gold       yellow         112   none  mascu…
3 R2-D2         96    32        white, bl… red             33   none  mascu…
4 Darth Va…    202   136 none       white      yellow          41.9 male  mascu…
5 Leia Org…    150    49 brown      light      brown           19   fema… femin…
6 Owen Lars    178   120 brown, gr… light      blue            52   male  mascu…
# … with 5 more variables: homeworld , species , films ,
#   vehicles , starships

Thee names of the columns that we want to select is in a vector.

column_name_vector %  select(column_name_vector)

The code does get executed and give a result that may not bee correct. And we also get the following warning .

Note: Using an external vector in selections is ambiguous.
ℹ Use `all_of(column_name_vector)` instead of `column_name_vector` to silence this message.
ℹ See .
This message is displayed once per session

In our example it give what we needed.

# A tibble: 87 × 4
   name               height skin_color  gender   
                              
 1 Luke Skywalker        172 fair        masculine
 2 C-3PO                 167 gold        masculine
 3 R2-D2                  96 white, blue masculine
 4 Darth Vader           202 white       masculine
 5 Leia Organa           150 light       feminine 
 6 Owen Lars             178 light       masculine
 7 Beru Whitesun lars    165 light       feminine 
 8 R5-D4                  97 white, red  masculine
 9 Biggs Darklighter     183 light       masculine
10 Obi-Wan Kenobi        182 fair        masculine
# … with 77 more rows

tidyselect’s all_of(): to select columns of from a vector

However, the right approach is to use all_of(vector_name) as argument to select() function. Now we will get the result.

starwars %>% 
  select(all_of(column_name_vector))

# A tibble: 87 × 4
   name               height skin_color  gender   
                              
 1 Luke Skywalker        172 fair        masculine
 2 C-3PO                 167 gold        masculine
 3 R2-D2                  96 white, blue masculine
 4 Darth Vader           202 white       masculine
 5 Leia Organa           150 light       feminine 
 6 Owen Lars             178 light       masculine
 7 Beru Whitesun lars    165 light       feminine 
 8 R5-D4                  97 white, red  masculine
 9 Biggs Darklighter     183 light       masculine
10 Obi-Wan Kenobi        182 fair        masculine
# … with 77 more rows

Note that all_of() function is

for strict selection. If any of the variables in the character vector is missing, an error is thrown.

# a vector containing a name that is not present in the dataframe
column_name_vector % 
  select(all_of(column_name_vector))

Since the column actor is not present in the dataframe, all_of() will throw the following error and quit.

Quitting from lines 37-41 (select_columns_from_vectors.qmd) 
Error in `select()`:
! Can't subset columns that don't exist.
✖ Column `actor` doesn't exist.
Backtrace:

In the situations, where you are not interested in getting all the columns in the vector, but any of the columns in the vector, we need to use any_of() function instead of all_of().

--- title: "tidyverse all_of(): select columns from a vector" date: 2022-11-02 categories: ['tidyselect'] format: html: code-fold: false code-tools: true --- In this tutorial, we will learn about how to select multiple columns from a dataframe by using the column names as a vector at once. tidyverse' [tidyselect package has numerous options](https://tidyselect.r-lib.org/reference/language.html) for [selecting columns from a datafame](https://rstats101.com/select-columns-that-starts-with-a-prefix/). all_of() is one of the functions in tidyselect that helps us selecting multiple columns using a character vector. Let us see an example of why we should use all_of() to select columns from a vector. First we will load tidyverse the meta R package. ```r library(tidyverse) ``` ```r starwars %>% head() # A tibble: 6 × 14 name height mass hair_color skin_color eye_color birth_year sex gender 1 Luke Sky… 172 77 blond fair blue 19 male mascu… 2 C-3PO 167 75 gold yellow 112 none mascu… 3 R2-D2 96 32 white, bl… red 33 none mascu… 4 Darth Va… 202 136 none white yellow 41.9 male mascu… 5 Leia Org… 150 49 brown light brown 19 fema… femin… 6 Owen Lars 178 120 brown, gr… light blue 52 male mascu… # … with 5 more variables: homeworld , species , films , # vehicles , starships ``` Thee names of the columns that we want to select is in a vector. ```r column_name_vector % select(column_name_vector) ``` The code does get executed and give a result that may not bee correct. And we also get the following warning . ```r Note: Using an external vector in selections is ambiguous. ℹ Use `all_of(column_name_vector)` instead of `column_name_vector` to silence this message. ℹ See . This message is displayed once per session ``` In our example it give what we needed. ```r # A tibble: 87 × 4 name height skin_color gender 1 Luke Skywalker 172 fair masculine 2 C-3PO 167 gold masculine 3 R2-D2 96 white, blue masculine 4 Darth Vader 202 white masculine 5 Leia Organa 150 light feminine 6 Owen Lars 178 light masculine 7 Beru Whitesun lars 165 light feminine 8 R5-D4 97 white, red masculine 9 Biggs Darklighter 183 light masculine 10 Obi-Wan Kenobi 182 fair masculine # … with 77 more rows ``` ### tidyselect's all_of(): to select columns of from a vector However, the right approach is to use all_of(vector_name) as argument to select() function. Now we will get the result. ```r starwars %>% select(all_of(column_name_vector)) # A tibble: 87 × 4 name height skin_color gender 1 Luke Skywalker 172 fair masculine 2 C-3PO 167 gold masculine 3 R2-D2 96 white, blue masculine 4 Darth Vader 202 white masculine 5 Leia Organa 150 light feminine 6 Owen Lars 178 light masculine 7 Beru Whitesun lars 165 light feminine 8 R5-D4 97 white, red masculine 9 Biggs Darklighter 183 light masculine 10 Obi-Wan Kenobi 182 fair masculine # … with 77 more rows ``` Note that all_of() function is for strict selection. If any of the variables in the character vector is missing, an error is thrown. ```r # a vector containing a name that is not present in the dataframe column_name_vector % select(all_of(column_name_vector)) ``` Since the column actor is not present in the dataframe, all_of() will throw the following error and quit. ```r Quitting from lines 37-41 (select_columns_from_vectors.qmd) Error in `select()`: ! Can't subset columns that don't exist. ✖ Column `actor` doesn't exist. Backtrace: ``` In the situations, where you are not interested in getting all the columns in the vector, but any of the columns in the vector, we need to use any_of() function instead of all_of().