How to add Prefix/Suffix to Column Names of a dataframe in R
Introduction
Adding prefixes or suffixes to column names is a common data manipulation task when preparing datasets for analysis or when combining multiple dataframes. This technique helps create descriptive column names, avoid naming conflicts, and organize your data more effectively.
Getting Started
library(tidyverse)
library(palmerpenguins)Example 1: Basic Usage
The Problem
We need to add prefixes and suffixes to column names in the penguins dataset. This is useful when you want to make column names more descriptive or prepare data for merging.
Step 1: Add Prefix to All Columns
Let’s start by adding a prefix to all column names in our dataset.
# Add "penguin_" prefix to all columns
penguins_prefixed <- penguins |>
rename_with(~ paste0("penguin_", .))
# View the new column names
names(penguins_prefixed)The rename_with() function applies the transformation to all columns, creating names like “penguin_species” and “penguin_bill_length_mm”.
Step 2: Add Suffix to All Columns
Now we’ll add a suffix to all column names using a similar approach.
# Add "_measured" suffix to all columns
penguins_suffixed <- penguins |>
rename_with(~ paste0(., "_measured"))
# Check the first few column names
head(names(penguins_suffixed), 4)This creates column names like “species_measured” and “bill_length_mm_measured” by appending the suffix to each original name.
Step 3: Target Specific Columns
We can also apply prefixes or suffixes to only selected columns using column selection helpers.
# Add "metric_" prefix only to numeric columns
penguins_selective <- penguins |>
rename_with(~ paste0("metric_", .), where(is.numeric))
# View all column names to see the change
names(penguins_selective)Only the numeric columns now have the “metric_” prefix, while character columns like “species” remain unchanged.
Example 2: Practical Application
The Problem
Imagine you’re combining survey data from different years and need to distinguish between measurements. You have penguin data that needs year-specific prefixes for numeric measurements while keeping identifying columns unchanged.
Step 1: Create Sample Multi-Year Data
First, let’s simulate having data from different survey years.
# Create 2020 survey data
penguins_2020 <- penguins |>
slice_head(n = 100) |>
select(species, island, bill_length_mm, bill_depth_mm)
head(penguins_2020, 3)We’ve created a subset representing 2020 survey data with key measurement columns.
Step 2: Add Year-Specific Prefixes to Measurements
Now we’ll add year prefixes to only the measurement columns while preserving the identifying columns.
# Add "y2020_" prefix to measurement columns only
penguins_2020_labeled <- penguins_2020 |>
rename_with(~ paste0("y2020_", .),
c(bill_length_mm, bill_depth_mm))
names(penguins_2020_labeled)The measurement columns now have year-specific prefixes while “species” and “island” remain unchanged for easy joining.
Step 3: Combine with Another Year’s Data
Let’s create data for another year and combine them to see the practical benefit.
# Create and label 2021 data
penguins_2021 <- penguins |>
slice_tail(n = 100) |>
select(species, island, bill_length_mm, bill_depth_mm) |>
rename_with(~ paste0("y2021_", .),
c(bill_length_mm, bill_depth_mm))Now we have clearly labeled measurement columns that won’t conflict when joining datasets.
Step 4: Demonstrate the Clean Join
Finally, let’s join the datasets to show how the prefixes prevent naming conflicts.
# Join the datasets by species and island
combined_data <- penguins_2020_labeled |>
full_join(penguins_2021, by = c("species", "island"))
# Check the structure
glimpse(combined_data)The resulting dataset clearly distinguishes between measurements from different years, making analysis much cleaner.
Summary
- Use
rename_with()withpaste0()to add prefixes (paste0("prefix_", .)) or suffixes (paste0(., "_suffix")) - Apply transformations to all columns or use
where()and column selection helpers for targeted renaming - Column name modifications are essential when preparing data for joins or merges
- Year-specific or source-specific prefixes help maintain data lineage and prevent naming conflicts
The pipe operator
|>makes these operations clean and readable in data processing workflows