Join dataframes by different column names with dplyr
In this tutorial, we will learn how to join dataframes by columns with different names in the two dataframes.
dplyr offers a great set of tools to join or merge dataframes. We will use two approaches to merge two dataframe by different names.
Join dataframes by different column names with dplyr
We will start with loading tidyverse.
library(tidyverse)Then we will create two toy dataframes such that the columns we would like to join them have different names.
Our first toy dataframe has two columns.
penguins1
## 1 Gentoo male
## 2 Adelie female
## 3 Chinstrap maleAnd the toy dataframe has two columns. And the first column in the second dataframe refers to the same variable in the first dataframe, but with different name.
penguins2
## 1 Gentoo 5700
## 2 Adelie 4500
## 3 Chinstrap 3900Joining dataframes with different column names with “by =c()”
One of the ways to join or merge two dataframes using columns that have different names is to use by argument with c(). Within c(), we specify the name in the first dataframe equals the name in the second dataframe. Note that the column names are provided as a string, i.e. within double quotes.
penguins1 %>%
left_join(penguins2, by = c("species"="penguin"))
## # A tibble: 3 × 3
## species sex body_mass
##
## 1 Gentoo male 5700
## 2 Adelie female 4500
## 3 Chinstrap male 3900Joining dataframes with different column names using “join_by”
Another way to join or merge two dataframes using columns that have different names is to use join_by() by. Within join_by(), we specify the name in the first dataframe and the name in the second dataframe with equality == symbol. Note that the column names are provided as a variable i.e. without double quotes.
penguins1 %>%
full_join(penguins2, by = join_by(species==penguin))
## # A tibble: 3 × 3
## species sex body_mass
##
## 1 Gentoo male 5700
## 2 Adelie female 4500
## 3 Chinstrap male 3900In the above examples we showed how to join dataframes by different column names using left_join()/full_join(). We could also use other joins in dplyr with different names.