How to select columns that starts with a prefix/string in R
In this tutorial, we will learn how to select columns that starts with a prefix or string using dplyr’s strats_with() function and base R startsWith() function. The tidyverse R package dplyr has a number of helper functions to select columns of interest under different condition. dplyr’s starts_with() function is one of select helper functions to select columns that start with a string. Similarly, we will show how to use base R’s startsWith() function to select column with a prefix.
To get started with some examples, let us load tidyverse and palmerpenguins package.
library(tidyvrerse)
library(palmerpenguins)
packageVersion("dplyr")
## [1] '1.0.9'Taking a quick look at the data and the column names of the dataframe, we can see few columns have a common prefix.
penguins %>% head(5)
## # A tibble: 5 × 8
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
##
## 1 Adelie Torge… 39.1 18.7 181 3750 male
## 2 Adelie Torge… 39.5 17.4 186 3800 fema…
## 3 Adelie Torge… 40.3 18 195 3250 fema…
## 4 Adelie Torge… NA NA NA NA
## 5 Adelie Torge… 36.7 19.3 193 3450 fema…
## # … with 1 more variable: year dplyr starts_with(): select columns starting with a character: Example 1
dplyr’s starts_With() function is one of the select helper function that can select columns using a prefix. First, let us select columns that starts with a character using starts_with() function. We get three columns that starts with a character “b”.
penguins %>%
select(starts_with("b"))
## # A tibble: 344 × 3
## bill_length_mm bill_depth_mm body_mass_g
##
## 1 39.1 18.7 3750
## 2 39.5 17.4 3800
## 3 40.3 18 3250
## 4 NA NA NA
## 5 36.7 19.3 3450
## 6 39.3 20.6 3650
## 7 38.9 17.8 3625
## 8 39.2 19.6 4675
## 9 34.1 18.1 3475
## 10 42 20.2 4250
## # … with 334 more rowsdplyr starts_with(): select columns starting with a string: Example 2
In this example, we are selecting columns that starts with a string using starts_with() function. We get two matching columns that starts with a string of interest.
penguins %>%
select(starts_with("bill"))
## # A tibble: 344 × 2
## bill_length_mm bill_depth_mm
##
## 1 39.1 18.7
## 2 39.5 17.4
## 3 40.3 18
## 4 NA NA
## 5 36.7 19.3
## 6 39.3 20.6
## 7 38.9 17.8
## 8 39.2 19.6
## 9 34.1 18.1
## 10 42 20.2
## # … with 334 more rowsbase R startsWith(): select columns starting with a string
Similarly, we can use base R’s startsWith() function to determine if the input start with a prefix string and it returns logical vector (TRUE/FALSE).
For example, we can determine if the column names of a dataframe starts with a character using startsWith() as shown below.
startsWith(colnames(penguins), "b")
## [1] FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSEIn order to select the columns that starts with a character, we will use the logical vector to subset the columns of a dataframe. Here, we can select columns starting with a character using
penguins[,startsWith(colnames(penguins), "b")]
## # A tibble: 344 × 3
## bill_length_mm bill_depth_mm body_mass_g
##
## 1 39.1 18.7 3750
## 2 39.5 17.4 3800
## 3 40.3 18 3250
## 4 NA NA NA
## 5 36.7 19.3 3450
## 6 39.3 20.6 3650
## 7 38.9 17.8 3625
## 8 39.2 19.6 4675
## 9 34.1 18.1 3475
## 10 42 20.2 4250
## # … with 334 more rowsIf we are interested in selecting columns tarts with a string, not just a character, the use case is very similar. We use startsWith() function to select columns whose names begins with a string “bill”
penguins[, startsWith(colnames(penguins), "bill")]
## # A tibble: 344 × 2
## bill_length_mm bill_depth_mm
##
## 1 39.1 18.7
## 2 39.5 17.4
## 3 40.3 18
## 4 NA NA
## 5 36.7 19.3
## 6 39.3 20.6
## 7 38.9 17.8
## 8 39.2 19.6
## 9 34.1 18.1
## 10 42 20.2
## # … with 334 more rows