How to use subset in R
Introduction
The subset() function in R provides an intuitive way to filter rows and select columns from data frames based on specific conditions. It’s particularly useful for data exploration and creating focused views of your data without modifying the original dataset.
Getting Started
library(tidyverse)
data(mtcars)Example 1: Basic Usage
The Problem
We need to filter the mtcars dataset to find cars with good fuel efficiency. Let’s extract cars that get more than 20 miles per gallon.
Step 1: Filter rows with simple condition
We’ll use subset() to filter cars based on mpg values.
# Filter cars with mpg > 20
efficient_cars <- subset(mtcars, mpg > 20)
head(efficient_cars, 3)This creates a new data frame containing only cars that exceed 20 mpg.
Step 2: Add multiple conditions
Now we’ll combine multiple criteria to be more specific about our selection.
# Filter cars with mpg > 20 AND automatic transmission
efficient_auto <- subset(mtcars, mpg > 20 & am == 0)
nrow(efficient_auto)We now have cars that are both fuel-efficient and have automatic transmission.
Step 3: Select specific columns
We can also choose which columns to include in our filtered results.
# Filter rows and select specific columns
car_basics <- subset(mtcars, mpg > 20,
select = c(mpg, hp, wt))
head(car_basics)This gives us a cleaner view with only the variables we’re interested in analyzing.
Example 2: Practical Application
The Problem
Imagine you’re a car dealer looking for vehicles to recommend to customers who want powerful yet reasonably efficient cars. You need cars with horsepower above 100, mpg above 15, and want to exclude the heaviest vehicles.
Step 1: Define the target criteria
We’ll start by filtering based on horsepower and fuel efficiency requirements.
# Find moderately powerful and efficient cars
target_cars <- subset(mtcars, hp > 100 & mpg > 15)
cat("Found", nrow(target_cars), "cars meeting basic criteria")This initial filter gives us a good starting point for our search.
Step 2: Refine with weight restrictions
Now we’ll add weight considerations to avoid recommending overly heavy vehicles.
# Add weight restriction (less than 3.5 thousand lbs)
ideal_cars <- subset(mtcars,
hp > 100 & mpg > 15 & wt < 3.5,
select = c(mpg, hp, wt, qsec))
print(ideal_cars)We now have cars that balance power, efficiency, and reasonable weight.
Step 3: Create a summary view
Let’s organize our results to make them more presentable for customers.
# Add row names as a column and arrange the data
final_recommendations <- ideal_cars |>
rownames_to_column("car_model") |>
arrange(desc(mpg))
print(final_recommendations)This creates a customer-friendly list ranked by fuel efficiency.
Step 4: Validate the selection
Finally, we’ll verify our selection makes sense by checking some basic statistics.
# Check the range of values in our selection
summary(final_recommendations[, c("mpg", "hp", "wt")])The summary confirms our filtered data meets all the specified criteria and shows the range of values customers can expect.
Summary
subset()provides an intuitive way to filter data frames using logical conditions with simple syntax- You can combine multiple conditions using logical operators like
&(and) and|(or) for complex filtering - The
selectparameter allows you to choose specific columns while filtering, reducing data complexity - Multiple conditions can be chained together to create very specific data selections for analysis
Always verify your subset results to ensure the filtering logic worked as expected