dplyr rows_update(): Modify existing rows

dplyr rows_update()

Published

September 20, 2024

In this post, we will learn how to use dplyr’s rows_update() function with examples. dplyr’s rows_update() function is a useful function to modify or update specific rows in a data frame based on a matching values in key column. It takes in two dataframes x and y, updates existing rows in the target data frame x with values from the y data frame where the keys (columns) match.

library(tidyverse)
packageVersion("dplyr")

[1] '1.1.4'

Basic syntax of dplyr’s rows_update() function looks as given below. It takes in two dataframes as input with optional arguments.

rows_update(
  x,
  y,
  by = NULL,
  ...,
  unmatched = c("error", "ignore"),
  copy = FALSE,
  in_place = FALSE
)

Let us create two small dataframes.The first dataframe is our target dataframe in which we want to update some rows.

# Original data frame
df1     
1          1 Alice      85
2          2 Bob        90
3          3 Charlie    88
4          4 Liz        92
5          5 Sam        89

And the second dataframe contains the new values that needed to be updated with and a key column specifying the variable that we want to update.

# Data frame with updated scores for some student_ids
df2  
1          1   100
2          5    98

In the example we are considering here we are interested in updates the scores of two student ids. Originally the students 1 & 5 had 85 & 89 as their scores (first dataframe). Now we want to update them to 100 & 98 (second dataframe).

df1 |>
  rows_update(df2)

Matching, by = "student_id"
# A tibble: 5 × 3
  student_id name    score
           
1          1 Alice     100
2          2 Bob        90
3          3 Charlie    88
4          4 Liz        92
5          5 Sam        98

dplyr’s row_update() identifies the column name to merge automatically, but we can also specify using “by” argument as shown below.

df1 |>
  rows_update(df2, by="student_id")

# A tibble: 5 × 3
  student_id name    score
           
1          1 Alice     100
2          2 Bob        90
3          3 Charlie    88
4          4 Liz        92
5          5 Sam        98

]

Note that the second dataframe y must have the same columns of x or a subset. If there is a column that is not present in the first dataframe, rows_update() will throw an error. Here is an example showing that.

# Data frame with updated scores for some student_ids
df2_new  
1          1   100
2          5    98

df1 |>
  rows_update(df2_new, by="student_id")

Error in `rows_update()`:
! All columns in `y` must exist in `x`.
ℹ The following columns only exist in `y`: `grade`.
Backtrace:
 1. dplyr::rows_update(df1, df2_new, by = "student_id")
 2. dplyr:::rows_update.data.frame(df1, df2_new, by = "student_id")
Error in rows_update(df1, df2_new, by = "student_id") : 
ℹ The following columns only exist in `y`: `grade`.

--- title: "dplyr rows_update(): Modify existing rows" date: 2024-09-20 categories: ['dplyr rows_update()'] format: html: code-fold: false code-tools: true --- In this post, we will learn how to use dplyr's rows_update() function with examples. dplyr's rows_update() function is a useful function to modify or update specific rows in a data frame based on a matching values in key column. It takes in two dataframes x and y, updates existing rows in the target data frame **x ** with values from the **y** data frame where the keys (columns) match. ```r library(tidyverse) packageVersion("dplyr") [1] '1.1.4' ``` Basic syntax of dplyr's rows_update() function looks as given below. It takes in two dataframes as input with optional arguments. ```r rows_update( x, y, by = NULL, ..., unmatched = c("error", "ignore"), copy = FALSE, in_place = FALSE ) ``` Let us create two small dataframes.The first dataframe is our target dataframe in which we want to update some rows. ```r # Original data frame df1 1 1 Alice 85 2 2 Bob 90 3 3 Charlie 88 4 4 Liz 92 5 5 Sam 89 ``` And the second dataframe contains the new values that needed to be updated with and a key column specifying the variable that we want to update. ```r # Data frame with updated scores for some student_ids df2 1 1 100 2 5 98 ``` In the example we are considering here we are interested in updates the scores of two student ids. Originally the students 1 & 5 had 85 & 89 as their scores (first dataframe). Now we want to update them to 100 & 98 (second dataframe). ```r df1 |> rows_update(df2) Matching, by = "student_id" # A tibble: 5 × 3 student_id name score 1 1 Alice 100 2 2 Bob 90 3 3 Charlie 88 4 4 Liz 92 5 5 Sam 98 ``` dplyr's row_update() identifies the column name to merge automatically, but we can also specify using "by" argument as shown below. ```r df1 |> rows_update(df2, by="student_id") # A tibble: 5 × 3 student_id name score 1 1 Alice 100 2 2 Bob 90 3 3 Charlie 88 4 4 Liz 92 5 5 Sam 98 ``` ] Note that the second dataframe **y** must have the same columns of **x or a subset**. If there is a column that is not present in the first dataframe, rows_update() will throw an error. Here is an example showing that. ```r # Data frame with updated scores for some student_ids df2_new 1 1 100 2 5 98 ``` ```r df1 |> rows_update(df2_new, by="student_id") ``` ```r Error in `rows_update()`: ! All columns in `y` must exist in `x`. ℹ The following columns only exist in `y`: `grade`. Backtrace: 1. dplyr::rows_update(df1, df2_new, by = "student_id") 2. dplyr:::rows_update.data.frame(df1, df2_new, by = "student_id") Error in rows_update(df1, df2_new, by = "student_id") : ℹ The following columns only exist in `y`: `grade`. ```