dplyr rows_update(): Modify existing rows

dplyr rows_update()
Master dplyr rows_update() to modify existing rows. Complete R tutorial with examples using real datasets.
Published

September 20, 2024

Introduction

The rows_update() function in dplyr allows you to modify existing rows in a data frame based on matching key columns. It’s particularly useful when you need to update specific records with new information while preserving the rest of your data intact.

Getting Started

library(tidyverse)
library(palmerpenguins)

Example 1: Basic Usage

The Problem

We need to update specific penguin records in our dataset with corrected measurements. Let’s say we discovered that some body mass measurements were recorded incorrectly and need to be fixed.

Step 1: Create our base dataset

We’ll start with a subset of penguins data to work with.

penguins_data <- penguins |>
  select(species, island, bill_length_mm, body_mass_g) |>
  slice(1:5) |>
  mutate(penguin_id = 1:5)

penguins_data

This creates a manageable dataset with 5 penguins, each assigned a unique ID.

Step 2: Prepare the update data

Now we’ll create a dataset containing the corrections we want to make.

corrections <- tibble(
  penguin_id = c(2, 4),
  body_mass_g = c(3800, 3900),
  bill_length_mm = c(40.5, 41.2)
)

corrections

We’re preparing to update penguins 2 and 4 with new body mass and bill length measurements.

Step 3: Apply the updates

We’ll use rows_update() to modify the existing records.

updated_penguins <- penguins_data |>
  rows_update(corrections, by = "penguin_id")

updated_penguins

The function matched records by penguin_id and updated the corresponding measurements while leaving other columns unchanged.

Example 2: Practical Application

The Problem

Imagine you’re managing a car inventory system using the mtcars dataset. You’ve received updated fuel efficiency ratings for specific car models and need to update your database. You also want to add a status column to track which records were modified.

Step 1: Prepare the inventory data

Let’s set up our car inventory with row names as a proper ID column.

car_inventory <- mtcars |>
  rownames_to_column("model") |>
  select(model, mpg, hp, wt) |>
  slice(1:8)

car_inventory

We now have a clean inventory with model names as identifiers and key specifications.

Step 2: Create the updated specifications

We’ll prepare new data for specific car models that need updates.

new_specs <- tibble(
  model = c("Datsun 710", "Hornet 4 Drive", "Valiant"),
  mpg = c(24.5, 22.1, 19.5),
  hp = c(95, 115, 108)
)

new_specs

These represent updated fuel efficiency and horsepower ratings from the manufacturer.

Step 3: Apply updates and track changes

We’ll update the records and add a column to mark which ones were modified.

car_inventory <- car_inventory |>
  mutate(last_updated = "original") |>
  rows_update(
    new_specs |> mutate(last_updated = "2024-updated"), 
    by = "model"
  )

car_inventory

The updated cars now show “2024-updated” in the last_updated column, while others remain marked as “original”.

Step 4: Verify the changes

Let’s check which specific values were updated.

car_inventory |>
  filter(last_updated == "2024-updated") |>
  select(model, mpg, hp, last_updated)

This shows us exactly which models were updated with their new specifications.

Summary

  • rows_update() modifies existing rows based on matching key columns specified in the by argument
  • Only columns present in the update dataset are modified; other columns remain unchanged
  • The function requires exact matches on key columns and will error if keys in the update data don’t exist in the original data
  • It’s perfect for applying corrections, updates, or modifications to specific records in your dataset
  • Always verify your key columns are unique to avoid unexpected behavior during updates