How to select columns that starts with a prefix/string in R

base R startsWith()

dplyr starts_with()

Published

June 23, 2022

In this tutorial, we will learn how to select columns that starts with a prefix or string using dplyr’s strats_with() function and base R startsWith() function. The tidyverse R package dplyr has a number of helper functions to select columns of interest under different condition. dplyr’s starts_with() function is one of select helper functions to select columns that start with a string. Similarly, we will show how to use base R’s startsWith() function to select column with a prefix.

To get started with some examples, let us load tidyverse and palmerpenguins package.

library(tidyvrerse)
library(palmerpenguins)
packageVersion("dplyr")

## [1] '1.0.9'

Taking a quick look at the data and the column names of the dataframe, we can see few columns have a common prefix.

penguins %>% head(5)

## # A tibble: 5 × 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##                                              
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           NA            NA                 NA          NA  
## 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
## # … with 1 more variable: year

dplyr starts_with(): select columns starting with a character: Example 1

dplyr’s starts_With() function is one of the select helper function that can select columns using a prefix. First, let us select columns that starts with a character using starts_with() function. We get three columns that starts with a character “b”.

penguins %>%
  select(starts_with("b"))

## # A tibble: 344 × 3
##    bill_length_mm bill_depth_mm body_mass_g
##                             
##  1           39.1          18.7        3750
##  2           39.5          17.4        3800
##  3           40.3          18          3250
##  4           NA            NA            NA
##  5           36.7          19.3        3450
##  6           39.3          20.6        3650
##  7           38.9          17.8        3625
##  8           39.2          19.6        4675
##  9           34.1          18.1        3475
## 10           42            20.2        4250
## # … with 334 more rows

dplyr starts_with(): select columns starting with a string: Example 2

In this example, we are selecting columns that starts with a string using starts_with() function. We get two matching columns that starts with a string of interest.

penguins %>%
  select(starts_with("bill"))

## # A tibble: 344 × 2
##    bill_length_mm bill_depth_mm
##                      
##  1           39.1          18.7
##  2           39.5          17.4
##  3           40.3          18  
##  4           NA            NA  
##  5           36.7          19.3
##  6           39.3          20.6
##  7           38.9          17.8
##  8           39.2          19.6
##  9           34.1          18.1
## 10           42            20.2
## # … with 334 more rows

base R startsWith(): select columns starting with a string

Similarly, we can use base R’s startsWith() function to determine if the input start with a prefix string and it returns logical vector (TRUE/FALSE).

For example, we can determine if the column names of a dataframe starts with a character using startsWith() as shown below.

startsWith(colnames(penguins), "b")

## [1] FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE

In order to select the columns that starts with a character, we will use the logical vector to subset the columns of a dataframe. Here, we can select columns starting with a character using

penguins[,startsWith(colnames(penguins), "b")]

## # A tibble: 344 × 3
##    bill_length_mm bill_depth_mm body_mass_g
##                             
##  1           39.1          18.7        3750
##  2           39.5          17.4        3800
##  3           40.3          18          3250
##  4           NA            NA            NA
##  5           36.7          19.3        3450
##  6           39.3          20.6        3650
##  7           38.9          17.8        3625
##  8           39.2          19.6        4675
##  9           34.1          18.1        3475
## 10           42            20.2        4250
## # … with 334 more rows

If we are interested in selecting columns tarts with a string, not just a character, the use case is very similar. We use startsWith() function to select columns whose names begins with a string “bill”

penguins[, startsWith(colnames(penguins), "bill")]

## # A tibble: 344 × 2
##    bill_length_mm bill_depth_mm
##                      
##  1           39.1          18.7
##  2           39.5          17.4
##  3           40.3          18  
##  4           NA            NA  
##  5           36.7          19.3
##  6           39.3          20.6
##  7           38.9          17.8
##  8           39.2          19.6
##  9           34.1          18.1
## 10           42            20.2
## # … with 334 more rows

--- title: "How to select columns that starts with a prefix/string in R" date: 2022-06-23 categories: ['base R startsWith()', 'dplyr starts_with()'] format: html: code-fold: false code-tools: true --- In this tutorial, we will learn how to select columns that starts with a prefix or string using dplyr's strats_with() function and base R startsWith() function. The tidyverse R package dplyr has a number of helper functions to select columns of interest under different condition. dplyr's starts_with() function is one of select helper functions to select columns that start with a string. Similarly, we will show how to use base R's startsWith() function to select column with a prefix. To get started with some examples, let us load tidyverse and palmerpenguins package. ```r library(tidyvrerse) library(palmerpenguins) packageVersion("dplyr") ## [1] '1.0.9' ``` Taking a quick look at the data and the column names of the dataframe, we can see few columns have a common prefix. ```r penguins %>% head(5) ## # A tibble: 5 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## ## 1 Adelie Torge… 39.1 18.7 181 3750 male ## 2 Adelie Torge… 39.5 17.4 186 3800 fema… ## 3 Adelie Torge… 40.3 18 195 3250 fema… ## 4 Adelie Torge… NA NA NA NA ## 5 Adelie Torge… 36.7 19.3 193 3450 fema… ## # … with 1 more variable: year ``` ### dplyr starts_with(): select columns starting with a character: Example 1 dplyr's starts_With() function is one of the select helper function that can select columns using a prefix. First, let us select columns that starts with a character using starts_with() function. We get three columns that starts with a character "b". ```r penguins %>% select(starts_with("b")) ## # A tibble: 344 × 3 ## bill_length_mm bill_depth_mm body_mass_g ## ## 1 39.1 18.7 3750 ## 2 39.5 17.4 3800 ## 3 40.3 18 3250 ## 4 NA NA NA ## 5 36.7 19.3 3450 ## 6 39.3 20.6 3650 ## 7 38.9 17.8 3625 ## 8 39.2 19.6 4675 ## 9 34.1 18.1 3475 ## 10 42 20.2 4250 ## # … with 334 more rows ``` ### dplyr starts_with(): select columns starting with a string: Example 2 In this example, we are selecting columns that starts with a string using starts_with() function. We get two matching columns that starts with a string of interest. ```r penguins %>% select(starts_with("bill")) ## # A tibble: 344 × 2 ## bill_length_mm bill_depth_mm ## ## 1 39.1 18.7 ## 2 39.5 17.4 ## 3 40.3 18 ## 4 NA NA ## 5 36.7 19.3 ## 6 39.3 20.6 ## 7 38.9 17.8 ## 8 39.2 19.6 ## 9 34.1 18.1 ## 10 42 20.2 ## # … with 334 more rows ``` ### base R startsWith(): select columns starting with a string Similarly, we can use base R's startsWith() function to determine if the input start with a prefix string and it returns logical vector (TRUE/FALSE). For example, we can determine if the column names of a dataframe starts with a character using startsWith() as shown below. ```r startsWith(colnames(penguins), "b") ## [1] FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE ``` In order to select the columns that starts with a character, we will use the logical vector to subset the columns of a dataframe. Here, we can select columns starting with a character using ```r penguins[,startsWith(colnames(penguins), "b")] ## # A tibble: 344 × 3 ## bill_length_mm bill_depth_mm body_mass_g ## ## 1 39.1 18.7 3750 ## 2 39.5 17.4 3800 ## 3 40.3 18 3250 ## 4 NA NA NA ## 5 36.7 19.3 3450 ## 6 39.3 20.6 3650 ## 7 38.9 17.8 3625 ## 8 39.2 19.6 4675 ## 9 34.1 18.1 3475 ## 10 42 20.2 4250 ## # … with 334 more rows ``` If we are interested in selecting columns tarts with a string, not just a character, the use case is very similar. We use startsWith() function to select columns whose names begins with a string "bill" ```r penguins[, startsWith(colnames(penguins), "bill")] ## # A tibble: 344 × 2 ## bill_length_mm bill_depth_mm ## ## 1 39.1 18.7 ## 2 39.5 17.4 ## 3 40.3 18 ## 4 NA NA ## 5 36.7 19.3 ## 6 39.3 20.6 ## 7 38.9 17.8 ## 8 39.2 19.6 ## 9 34.1 18.1 ## 10 42 20.2 ## # … with 334 more rows ```