How to use merge in R

base-r

merge

Master merge in R programming with clear examples. Complete tutorial covering syntax, use cases, and best practices.

Published

February 22, 2026

Introduction

The merge() function in R combines two data frames based on common columns or row names. It’s essential when you need to join datasets that share key variables, similar to SQL joins in database operations.

Getting Started

library(tidyverse)
data(mtcars)

Example 1: Basic Usage

The Problem

We need to combine two separate datasets about cars - one with performance metrics and another with manufacturer details. This demonstrates the fundamental merge operation.

Step 1: Create sample datasets

Let’s create two related datasets from the mtcars data.

# Performance data
performance <- mtcars |>
  select(mpg, hp, qsec) |>
  rownames_to_column("model")

head(performance, 3)

This creates our first dataset with car models and their performance metrics.

Step 2: Create manufacturer data

Now we’ll create a second dataset with manufacturer information.

# Manufacturer data
manufacturers <- data.frame(
  model = c("Mazda RX4", "Datsun 710", "Hornet 4 Drive", 
            "Valiant", "Merc 240D"),
  manufacturer = c("Mazda", "Nissan", "Dodge", "Plymouth", "Mercedes"),
  country = c("Japan", "Japan", "USA", "USA", "Germany")
)

We now have manufacturer details for some of the cars in our performance dataset.

Step 3: Perform basic merge

Let’s combine these datasets using the model column as our key.

# Inner join - only matching records
merged_data <- merge(performance, manufacturers, 
                     by = "model")

print(merged_data)

The merge created a new dataset containing only cars that appear in both datasets, combining all their information.

Example 2: Practical Application

The Problem

You’re analyzing car sales data where customer information and purchase details are stored separately. You need to combine them to analyze customer purchasing patterns while handling cases where not all records match.

Step 1: Create customer dataset

We’ll simulate customer data with some cars from mtcars.

# Customer purchase data
customers <- data.frame(
  car_model = c("Mazda RX4", "Honda Civic", "Merc 240D", 
                "Fiat 128", "Toyota Corona"),
  customer_id = c(101, 102, 103, 104, 105),
  purchase_date = c("2023-01-15", "2023-02-20", "2023-01-30", 
                   "2023-03-10", "2023-02-05")
)

This represents our sales records with customer information.

Step 2: Prepare car specifications

Let’s extract relevant car specifications for merging.

# Car specifications
car_specs <- mtcars |>
  select(mpg, cyl, hp) |>
  rownames_to_column("car_model") |>
  head(8)

We now have technical specifications for several car models.

Step 3: Perform left join

Use merge to keep all customer records, even if car specs are missing.

# Left join - keep all customers
sales_analysis <- merge(customers, car_specs, 
                       by = "car_model", 
                       all.x = TRUE)

print(sales_analysis)

This preserves all customer records while adding available car specifications, with NA values where specs don’t exist.

Step 4: Handle different join types

Demonstrate various merge types for different analytical needs.

# Right join - all car specs
right_merge <- merge(customers, car_specs, 
                    by = "car_model", all.y = TRUE)

# Full outer join - all records
full_merge <- merge(customers, car_specs, 
                   by = "car_model", all = TRUE)

Different join types help answer different business questions about your data relationships.

Step 5: Merge with different column names

Handle cases where key columns have different names in each dataset.

# Rename for demonstration
customers_renamed <- customers |>
  rename(model_name = car_model)

# Merge with different column names
final_merge <- merge(customers_renamed, car_specs,
                    by.x = "model_name", 
                    by.y = "car_model")

This shows how to merge when your key columns don’t share the same name across datasets.

Summary

merge() combines data frames based on common columns, similar to SQL joins
Use by parameter to specify which columns to join on
all.x = TRUE performs left joins, all.y = TRUE performs right joins
all = TRUE creates full outer joins including all records from both datasets
Use by.x and by.y when key columns have different names in each dataset

--- title: "How to use merge in R" description: "Master merge in R programming with clear examples. Complete tutorial covering syntax, use cases, and best practices." date: 2026-02-22 categories: ['base-r', 'merge'] format: html: code-fold: false code-tools: true --- ## Introduction The `merge()` function in R combines two data frames based on common columns or row names. It's essential when you need to join datasets that share key variables, similar to SQL joins in database operations. ## Getting Started ```r library(tidyverse) data(mtcars) ``` ## Example 1: Basic Usage ### The Problem We need to combine two separate datasets about cars - one with performance metrics and another with manufacturer details. This demonstrates the fundamental merge operation. ### Step 1: Create sample datasets Let's create two related datasets from the mtcars data. ```r # Performance data performance <- mtcars |> select(mpg, hp, qsec) |> rownames_to_column("model") head(performance, 3) ``` This creates our first dataset with car models and their performance metrics. ### Step 2: Create manufacturer data Now we'll create a second dataset with manufacturer information. ```r # Manufacturer data manufacturers <- data.frame( model = c("Mazda RX4", "Datsun 710", "Hornet 4 Drive", "Valiant", "Merc 240D"), manufacturer = c("Mazda", "Nissan", "Dodge", "Plymouth", "Mercedes"), country = c("Japan", "Japan", "USA", "USA", "Germany") ) ``` We now have manufacturer details for some of the cars in our performance dataset. ### Step 3: Perform basic merge Let's combine these datasets using the model column as our key. ```r # Inner join - only matching records merged_data <- merge(performance, manufacturers, by = "model") print(merged_data) ``` The merge created a new dataset containing only cars that appear in both datasets, combining all their information. ## Example 2: Practical Application ### The Problem You're analyzing car sales data where customer information and purchase details are stored separately. You need to combine them to analyze customer purchasing patterns while handling cases where not all records match. ### Step 1: Create customer dataset We'll simulate customer data with some cars from mtcars. ```r # Customer purchase data customers <- data.frame( car_model = c("Mazda RX4", "Honda Civic", "Merc 240D", "Fiat 128", "Toyota Corona"), customer_id = c(101, 102, 103, 104, 105), purchase_date = c("2023-01-15", "2023-02-20", "2023-01-30", "2023-03-10", "2023-02-05") ) ``` This represents our sales records with customer information. ### Step 2: Prepare car specifications Let's extract relevant car specifications for merging. ```r # Car specifications car_specs <- mtcars |> select(mpg, cyl, hp) |> rownames_to_column("car_model") |> head(8) ``` We now have technical specifications for several car models. ### Step 3: Perform left join Use merge to keep all customer records, even if car specs are missing. ```r # Left join - keep all customers sales_analysis <- merge(customers, car_specs, by = "car_model", all.x = TRUE) print(sales_analysis) ``` This preserves all customer records while adding available car specifications, with NA values where specs don't exist. ### Step 4: Handle different join types Demonstrate various merge types for different analytical needs. ```r # Right join - all car specs right_merge <- merge(customers, car_specs, by = "car_model", all.y = TRUE) # Full outer join - all records full_merge <- merge(customers, car_specs, by = "car_model", all = TRUE) ``` Different join types help answer different business questions about your data relationships. ### Step 5: Merge with different column names Handle cases where key columns have different names in each dataset. ```r # Rename for demonstration customers_renamed <- customers |> rename(model_name = car_model) # Merge with different column names final_merge <- merge(customers_renamed, car_specs, by.x = "model_name", by.y = "car_model") ``` This shows how to merge when your key columns don't share the same name across datasets. ## Summary - `merge()` combines data frames based on common columns, similar to SQL joins - Use `by` parameter to specify which columns to join on - `all.x = TRUE` performs left joins, `all.y = TRUE` performs right joins - `all = TRUE` creates full outer joins including all records from both datasets - Use `by.x` and `by.y` when key columns have different names in each dataset --- ## Related Posts - [How to use mapply in R](/base-r/how-to-use-mapply-in-r.html) - [How to use read.csv in R](/base-r/how-to-use-readcsv-in-r.html) - [How to use order in R](/base-r/how-to-use-order-in-r.html) - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [How to use mutate() in R](/dplyr/how-to-use-mutate-in-r.html)

Introduction

Getting Started

Example 1: Basic Usage

The Problem

Step 1: Create sample datasets

Step 2: Create manufacturer data

Step 3: Perform basic merge

Example 2: Practical Application

The Problem

Step 1: Create customer dataset

Step 2: Prepare car specifications

Step 3: Perform left join

Step 4: Handle different join types

Step 5: Merge with different column names

Summary

Use by.x and by.y when key columns have different names in each dataset

Related Posts

Use `by.x` and `by.y` when key columns have different names in each dataset