How to use data.frame in R

base-r
data.frame
Learn how to perform use data.frame in R. Step-by-step statistical tutorial with examples.
Published

February 21, 2026

Introduction

A data.frame is R’s fundamental data structure for storing rectangular data with rows and columns, similar to a spreadsheet or database table. It’s the most commonly used object for data analysis because it can hold different data types (numeric, character, logical) in different columns while maintaining the same length for each column.

Getting Started

library(tidyverse)

Example 1: Basic Usage

The Problem

We need to create and manipulate a simple dataset to understand how data.frames store and organize information. Let’s start by building a data.frame from scratch and exploring its basic properties.

Step 1: Create a basic data.frame

We’ll construct a data.frame using vectors of equal length.

# Create a simple data.frame
students <- data.frame(
  name = c("Alice", "Bob", "Carol", "David"),
  age = c(20, 22, 19, 21),
  grade = c("A", "B", "A", "C"),
  passed = c(TRUE, TRUE, TRUE, FALSE)
)

This creates a data.frame with 4 rows and 4 columns of different data types.

Step 2: Examine the structure

Understanding your data.frame’s structure is essential for further analysis.

# Explore the data.frame structure
str(students)
head(students)
dim(students)

The str() function shows data types, head() displays the first few rows, and dim() returns the dimensions.

Step 3: Access specific elements

Data.frames offer multiple ways to extract data using indexing and column names.

# Access columns and rows
students$name          # Access name column
students[1, ]          # Access first row
students[, "age"]      # Access age column
students[1:2, c("name", "grade")]  # Multiple rows/columns

These indexing methods allow precise data extraction using row numbers, column names, or combinations.

Example 2: Practical Application

The Problem

Let’s work with the built-in mtcars dataset to perform realistic data analysis tasks. We need to filter data, create new variables, and summarize information to understand car performance characteristics across different categories.

Step 1: Load and examine real data

We’ll start by exploring the mtcars dataset structure and contents.

# Load and examine mtcars dataset
data(mtcars)
head(mtcars)
str(mtcars)
rownames(mtcars)[1:5]

The mtcars dataset contains 32 car models with 11 performance variables, with car names stored as row names.

Step 2: Filter and subset data

Let’s extract cars that meet specific performance criteria.

# Filter high-performance cars
fast_cars <- mtcars[mtcars$hp > 150 & mtcars$mpg > 15, ]
nrow(fast_cars)

# Select specific columns
efficiency <- mtcars[, c("mpg", "hp", "wt", "qsec")]

We filtered for cars with over 150 horsepower and 15+ mpg, then created a subset focusing on efficiency metrics.

Step 3: Create new variables

Adding calculated columns enhances our analysis capabilities.

# Create new variables
mtcars$hp_per_weight <- mtcars$hp / mtcars$wt
mtcars$efficiency_class <- ifelse(mtcars$mpg > 20, "High", "Low")

# View the additions
head(mtcars[, c("hp", "wt", "hp_per_weight", "efficiency_class")])

We calculated horsepower-to-weight ratio and classified cars by fuel efficiency for deeper analysis.

Step 4: Group analysis using modern syntax

Using pipes makes data manipulation more readable and intuitive.

# Analyze by cylinder groups using modern R
mtcars |>
  group_by(cyl) |>
  summarise(
    avg_mpg = mean(mpg),
    avg_hp = mean(hp),
    count = n()
  )

This pipeline groups cars by cylinder count and calculates average performance metrics for each group.

Step 5: Export and save results

Saving your processed data.frame preserves analysis results for future use.

# Save processed data
write.csv(mtcars, "processed_mtcars.csv", row.names = TRUE)

# Or save as R object
saveRDS(mtcars, "mtcars_analysis.rds")

These functions export your data.frame to CSV format or save as an R object for later loading.

Summary

  • Data.frames are R’s primary structure for rectangular data, combining different data types in columns
  • Create data.frames using data.frame() function or load existing datasets with data()
  • Access data using $ notation, bracket indexing, or column names for flexible data extraction
  • Filter and subset using logical conditions and bracket notation for targeted analysis
  • Use modern pipe operators |> with dplyr functions for readable data manipulation workflows