dplyr rows_update(): Modify existing rows
Introduction
The rows_update() function in dplyr allows you to modify existing rows in a data frame based on matching key columns. It’s particularly useful when you need to update specific records with new information while preserving the rest of your data intact.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
We need to update specific penguin records in our dataset with corrected measurements. Let’s say we discovered that some body mass measurements were recorded incorrectly and need to be fixed.
Step 1: Create our base dataset
We’ll start with a subset of penguins data to work with.
penguins_data <- penguins |>
select(species, island, bill_length_mm, body_mass_g) |>
slice(1:5) |>
mutate(penguin_id = 1:5)
penguins_dataThis creates a manageable dataset with 5 penguins, each assigned a unique ID.
Step 2: Prepare the update data
Now we’ll create a dataset containing the corrections we want to make.
corrections <- tibble(
penguin_id = c(2, 4),
body_mass_g = c(3800, 3900),
bill_length_mm = c(40.5, 41.2)
)
correctionsWe’re preparing to update penguins 2 and 4 with new body mass and bill length measurements.
Step 3: Apply the updates
We’ll use rows_update() to modify the existing records.
updated_penguins <- penguins_data |>
rows_update(corrections, by = "penguin_id")
updated_penguinsThe function matched records by penguin_id and updated the corresponding measurements while leaving other columns unchanged.
Example 2: Practical Application
The Problem
Imagine you’re managing a car inventory system using the mtcars dataset. You’ve received updated fuel efficiency ratings for specific car models and need to update your database. You also want to add a status column to track which records were modified.
Step 1: Prepare the inventory data
Let’s set up our car inventory with row names as a proper ID column.
car_inventory <- mtcars |>
rownames_to_column("model") |>
select(model, mpg, hp, wt) |>
slice(1:8)
car_inventoryWe now have a clean inventory with model names as identifiers and key specifications.
Step 2: Create the updated specifications
We’ll prepare new data for specific car models that need updates.
new_specs <- tibble(
model = c("Datsun 710", "Hornet 4 Drive", "Valiant"),
mpg = c(24.5, 22.1, 19.5),
hp = c(95, 115, 108)
)
new_specsThese represent updated fuel efficiency and horsepower ratings from the manufacturer.
Step 3: Apply updates and track changes
We’ll update the records and add a column to mark which ones were modified.
car_inventory <- car_inventory |>
mutate(last_updated = "original") |>
rows_update(
new_specs |> mutate(last_updated = "2024-updated"),
by = "model"
)
car_inventoryThe updated cars now show “2024-updated” in the last_updated column, while others remain marked as “original”.
Step 4: Verify the changes
Let’s check which specific values were updated.
car_inventory |>
filter(last_updated == "2024-updated") |>
select(model, mpg, hp, last_updated)This shows us exactly which models were updated with their new specifications.
Summary
rows_update()modifies existing rows based on matching key columns specified in thebyargument- Only columns present in the update dataset are modified; other columns remain unchanged
- The function requires exact matches on key columns and will error if keys in the update data don’t exist in the original data
- It’s perfect for applying corrections, updates, or modifications to specific records in your dataset
Always verify your key columns are unique to avoid unexpected behavior during updates