How to update rows in a dataframe with rows_update() in R

dplyr

dplyr rows_update()

Learn update rows in a dataframe with rows_update() in r with clear examples and explanations.

Published

March 26, 2026

Introduction

The rows_update() function in dplyr provides a powerful way to update existing rows in a data frame using values from another data frame. Similar to SQL’s UPDATE statement, it modifies rows where key columns match between two tables, making it ideal for applying corrections, updates, or changes to specific records without altering the overall structure of your data.

Setting Up

Let’s start by loading the required packages and checking our dplyr version:

library(tidyverse)
packageVersion("dplyr")

Basic Syntax and Parameters

The rows_update() function follows this general syntax:

rows_update(
  x,                    # target data frame to update
  y,                    # data frame with new values
  by = NULL,            # key columns for matching
  unmatched = "error",  # how to handle unmatched keys
  copy = FALSE,
  in_place = FALSE
)

The function requires that key values in the update data frame (y) are unique and, by default, must exist in the target data frame (x).

Basic Example: Updating Student Scores

Let’s create a practical example with student data where we need to update some test scores:

# Original student data
students <- tibble(
  student_id = 1:5,
  name = c("Alice", "Bob", "Charlie", "Liz", "Sam"),
  score = c(85, 90, 88, 92, 89)
)

Now let’s create a data frame with updated scores for specific students:

# Updated scores for students 1 and 5
score_updates <- tibble(
  student_id = c(1, 5),
  score = c(100, 98)
)

Let’s examine our original data first:

students

And our update data:

score_updates

Performing the Update

Now we can update the original data frame with the new scores:

updated_students <- students |>
  rows_update(score_updates)

Since both data frames share the student_id column, rows_update() automatically uses it as the key. Notice that only Alice’s and Sam’s scores changed to 100 and 98 respectively, while other rows remained unchanged.

Specifying Key Columns Explicitly

You can explicitly specify which column(s) to use for matching using the by parameter:

students |>
  rows_update(score_updates, by = "student_id")

This produces the same result but makes the key column explicit, which is helpful when working with complex data or when column names might be ambiguous.

Handling Extra Columns

What happens when the update data frame contains additional columns not present in the original data? Let’s see:

# Update data with an extra column
updates_with_grade <- tibble(
  student_id = c(1, 5),
  score = c(100, 98),
  grade = c("A", "A")
)

If we try to update with this data frame containing the extra grade column:

# This will cause an error
students |>
  rows_update(updates_with_grade, by = "student_id")

This will produce an error because rows_update() only updates existing columns and doesn’t add new ones. The original data frame structure is preserved.

Key Points and Best Practices

When using rows_update(), remember these important points:

Key values in the update data frame must be unique
Only existing columns in the target data frame will be updated
The function preserves the original structure and row order
Unmatched keys in the update data frame cause an error by default

Summary

The rows_update() function is an essential tool for maintaining and updating data frames in a controlled manner. It provides a safe way to apply updates to specific rows while preserving data integrity and structure. Use it when you need to modify existing records without adding new rows or columns, making it perfect for data correction workflows and incremental updates.

--- title: "How to update rows in a dataframe with rows_update() in R" description: "Learn update rows in a dataframe with rows_update() in r with clear examples and explanations." date: 2026-03-26 categories: ['dplyr', 'dplyr rows_update()'] format: html: code-fold: false code-tools: true --- ## Introduction The `rows_update()` function in dplyr provides a powerful way to update existing rows in a data frame using values from another data frame. Similar to SQL's UPDATE statement, it modifies rows where key columns match between two tables, making it ideal for applying corrections, updates, or changes to specific records without altering the overall structure of your data. ## Setting Up Let's start by loading the required packages and checking our dplyr version: ```r library(tidyverse) packageVersion("dplyr") ``` ## Basic Syntax and Parameters The `rows_update()` function follows this general syntax: ```r rows_update( x, # target data frame to update y, # data frame with new values by = NULL, # key columns for matching unmatched = "error", # how to handle unmatched keys copy = FALSE, in_place = FALSE ) ``` The function requires that key values in the update data frame (`y`) are unique and, by default, must exist in the target data frame (`x`). ## Basic Example: Updating Student Scores Let's create a practical example with student data where we need to update some test scores: ```r # Original student data students <- tibble( student_id = 1:5, name = c("Alice", "Bob", "Charlie", "Liz", "Sam"), score = c(85, 90, 88, 92, 89) ) ``` Now let's create a data frame with updated scores for specific students: ```r # Updated scores for students 1 and 5 score_updates <- tibble( student_id = c(1, 5), score = c(100, 98) ) ``` Let's examine our original data first: ```r students ``` And our update data: ```r score_updates ``` ## Performing the Update Now we can update the original data frame with the new scores: ```r updated_students <- students |> rows_update(score_updates) ``` Since both data frames share the `student_id` column, `rows_update()` automatically uses it as the key. Notice that only Alice's and Sam's scores changed to 100 and 98 respectively, while other rows remained unchanged. ## Specifying Key Columns Explicitly You can explicitly specify which column(s) to use for matching using the `by` parameter: ```r students |> rows_update(score_updates, by = "student_id") ``` This produces the same result but makes the key column explicit, which is helpful when working with complex data or when column names might be ambiguous. ## Handling Extra Columns What happens when the update data frame contains additional columns not present in the original data? Let's see: ```r # Update data with an extra column updates_with_grade <- tibble( student_id = c(1, 5), score = c(100, 98), grade = c("A", "A") ) ``` If we try to update with this data frame containing the extra `grade` column: ```r # This will cause an error students |> rows_update(updates_with_grade, by = "student_id") ``` This will produce an error because `rows_update()` only updates existing columns and doesn't add new ones. The original data frame structure is preserved. ## Key Points and Best Practices When using `rows_update()`, remember these important points: - Key values in the update data frame must be unique - Only existing columns in the target data frame will be updated - The function preserves the original structure and row order - Unmatched keys in the update data frame cause an error by default ## Summary The `rows_update()` function is an essential tool for maintaining and updating data frames in a controlled manner. It provides a safe way to apply updates to specific rows while preserving data integrity and structure. Use it when you need to modify existing records without adding new rows or columns, making it perfect for data correction workflows and incremental updates.