dplyr rows_update(): Modify existing rows
In this post, we will learn how to use dplyr’s rows_update() function with examples. dplyr’s rows_update() function is a useful function to modify or update specific rows in a data frame based on a matching values in key column. It takes in two dataframes x and y, updates existing rows in the target data frame x with values from the y data frame where the keys (columns) match.
library(tidyverse)
packageVersion("dplyr")
[1] '1.1.4'Basic syntax of dplyr’s rows_update() function looks as given below. It takes in two dataframes as input with optional arguments.
rows_update(
x,
y,
by = NULL,
...,
unmatched = c("error", "ignore"),
copy = FALSE,
in_place = FALSE
)Let us create two small dataframes.The first dataframe is our target dataframe in which we want to update some rows.
# Original data frame
df1
1 1 Alice 85
2 2 Bob 90
3 3 Charlie 88
4 4 Liz 92
5 5 Sam 89And the second dataframe contains the new values that needed to be updated with and a key column specifying the variable that we want to update.
# Data frame with updated scores for some student_ids
df2
1 1 100
2 5 98In the example we are considering here we are interested in updates the scores of two student ids. Originally the students 1 & 5 had 85 & 89 as their scores (first dataframe). Now we want to update them to 100 & 98 (second dataframe).
df1 |>
rows_update(df2)
Matching, by = "student_id"
# A tibble: 5 × 3
student_id name score
1 1 Alice 100
2 2 Bob 90
3 3 Charlie 88
4 4 Liz 92
5 5 Sam 98dplyr’s row_update() identifies the column name to merge automatically, but we can also specify using “by” argument as shown below.
df1 |>
rows_update(df2, by="student_id")
# A tibble: 5 × 3
student_id name score
1 1 Alice 100
2 2 Bob 90
3 3 Charlie 88
4 4 Liz 92
5 5 Sam 98]
Note that the second dataframe y must have the same columns of x or a subset. If there is a column that is not present in the first dataframe, rows_update() will throw an error. Here is an example showing that.
# Data frame with updated scores for some student_ids
df2_new
1 1 100
2 5 98df1 |>
rows_update(df2_new, by="student_id")Error in `rows_update()`:
! All columns in `y` must exist in `x`.
ℹ The following columns only exist in `y`: `grade`.
Backtrace:
1. dplyr::rows_update(df1, df2_new, by = "student_id")
2. dplyr:::rows_update.data.frame(df1, df2_new, by = "student_id")
Error in rows_update(df1, df2_new, by = "student_id") :
ℹ The following columns only exist in `y`: `grade`.