How to use data.frame in R

base-r

data.frame

Learn how to perform use data.frame in R. Step-by-step statistical tutorial with examples.

Published

February 21, 2026

Introduction

A data.frame is R’s fundamental data structure for storing rectangular data with rows and columns, similar to a spreadsheet or database table. It’s the most commonly used object for data analysis because it can hold different data types (numeric, character, logical) in different columns while maintaining the same length for each column.

Getting Started

library(tidyverse)

Example 1: Basic Usage

The Problem

We need to create and manipulate a simple dataset to understand how data.frames store and organize information. Let’s start by building a data.frame from scratch and exploring its basic properties.

Step 1: Create a basic data.frame

We’ll construct a data.frame using vectors of equal length.

# Create a simple data.frame
students <- data.frame(
  name = c("Alice", "Bob", "Carol", "David"),
  age = c(20, 22, 19, 21),
  grade = c("A", "B", "A", "C"),
  passed = c(TRUE, TRUE, TRUE, FALSE)
)

This creates a data.frame with 4 rows and 4 columns of different data types.

Step 2: Examine the structure

Understanding your data.frame’s structure is essential for further analysis.

# Explore the data.frame structure
str(students)
head(students)
dim(students)

The str() function shows data types, head() displays the first few rows, and dim() returns the dimensions.

Step 3: Access specific elements

Data.frames offer multiple ways to extract data using indexing and column names.

# Access columns and rows
students$name          # Access name column
students[1, ]          # Access first row
students[, "age"]      # Access age column
students[1:2, c("name", "grade")]  # Multiple rows/columns

These indexing methods allow precise data extraction using row numbers, column names, or combinations.

Example 2: Practical Application

The Problem

Let’s work with the built-in mtcars dataset to perform realistic data analysis tasks. We need to filter data, create new variables, and summarize information to understand car performance characteristics across different categories.

Step 1: Load and examine real data

We’ll start by exploring the mtcars dataset structure and contents.

# Load and examine mtcars dataset
data(mtcars)
head(mtcars)
str(mtcars)
rownames(mtcars)[1:5]

The mtcars dataset contains 32 car models with 11 performance variables, with car names stored as row names.

Step 2: Filter and subset data

Let’s extract cars that meet specific performance criteria.

# Filter high-performance cars
fast_cars <- mtcars[mtcars$hp > 150 & mtcars$mpg > 15, ]
nrow(fast_cars)

# Select specific columns
efficiency <- mtcars[, c("mpg", "hp", "wt", "qsec")]

We filtered for cars with over 150 horsepower and 15+ mpg, then created a subset focusing on efficiency metrics.

Step 3: Create new variables

Adding calculated columns enhances our analysis capabilities.

# Create new variables
mtcars$hp_per_weight <- mtcars$hp / mtcars$wt
mtcars$efficiency_class <- ifelse(mtcars$mpg > 20, "High", "Low")

# View the additions
head(mtcars[, c("hp", "wt", "hp_per_weight", "efficiency_class")])

We calculated horsepower-to-weight ratio and classified cars by fuel efficiency for deeper analysis.

Step 4: Group analysis using modern syntax

Using pipes makes data manipulation more readable and intuitive.

# Analyze by cylinder groups using modern R
mtcars |>
  group_by(cyl) |>
  summarise(
    avg_mpg = mean(mpg),
    avg_hp = mean(hp),
    count = n()
  )

This pipeline groups cars by cylinder count and calculates average performance metrics for each group.

Step 5: Export and save results

Saving your processed data.frame preserves analysis results for future use.

# Save processed data
write.csv(mtcars, "processed_mtcars.csv", row.names = TRUE)

# Or save as R object
saveRDS(mtcars, "mtcars_analysis.rds")

These functions export your data.frame to CSV format or save as an R object for later loading.

Summary

Data.frames are R’s primary structure for rectangular data, combining different data types in columns
Create data.frames using data.frame() function or load existing datasets with data()
Access data using $ notation, bracket indexing, or column names for flexible data extraction
Filter and subset using logical conditions and bracket notation for targeted analysis
Use modern pipe operators |> with dplyr functions for readable data manipulation workflows

--- title: "How to use data.frame in R" description: "Learn how to perform use data.frame in R. Step-by-step statistical tutorial with examples." date: 2026-02-21 categories: ['base-r', 'data.frame'] format: html: code-fold: false code-tools: true --- ## Introduction A data.frame is R's fundamental data structure for storing rectangular data with rows and columns, similar to a spreadsheet or database table. It's the most commonly used object for data analysis because it can hold different data types (numeric, character, logical) in different columns while maintaining the same length for each column. ## Getting Started ```r library(tidyverse) ``` ## Example 1: Basic Usage ### The Problem We need to create and manipulate a simple dataset to understand how data.frames store and organize information. Let's start by building a data.frame from scratch and exploring its basic properties. ### Step 1: Create a basic data.frame We'll construct a data.frame using vectors of equal length. ```r # Create a simple data.frame students <- data.frame( name = c("Alice", "Bob", "Carol", "David"), age = c(20, 22, 19, 21), grade = c("A", "B", "A", "C"), passed = c(TRUE, TRUE, TRUE, FALSE) ) ``` This creates a data.frame with 4 rows and 4 columns of different data types. ### Step 2: Examine the structure Understanding your data.frame's structure is essential for further analysis. ```r # Explore the data.frame structure str(students) head(students) dim(students) ``` The `str()` function shows data types, `head()` displays the first few rows, and `dim()` returns the dimensions. ### Step 3: Access specific elements Data.frames offer multiple ways to extract data using indexing and column names. ```r # Access columns and rows students$name # Access name column students[1, ] # Access first row students[, "age"] # Access age column students[1:2, c("name", "grade")] # Multiple rows/columns ``` These indexing methods allow precise data extraction using row numbers, column names, or combinations. ## Example 2: Practical Application ### The Problem Let's work with the built-in mtcars dataset to perform realistic data analysis tasks. We need to filter data, create new variables, and summarize information to understand car performance characteristics across different categories. ### Step 1: Load and examine real data We'll start by exploring the mtcars dataset structure and contents. ```r # Load and examine mtcars dataset data(mtcars) head(mtcars) str(mtcars) rownames(mtcars)[1:5] ``` The mtcars dataset contains 32 car models with 11 performance variables, with car names stored as row names. ### Step 2: Filter and subset data Let's extract cars that meet specific performance criteria. ```r # Filter high-performance cars fast_cars <- mtcars[mtcars$hp > 150 & mtcars$mpg > 15, ] nrow(fast_cars) # Select specific columns efficiency <- mtcars[, c("mpg", "hp", "wt", "qsec")] ``` We filtered for cars with over 150 horsepower and 15+ mpg, then created a subset focusing on efficiency metrics. ### Step 3: Create new variables Adding calculated columns enhances our analysis capabilities. ```r # Create new variables mtcars$hp_per_weight <- mtcars$hp / mtcars$wt mtcars$efficiency_class <- ifelse(mtcars$mpg > 20, "High", "Low") # View the additions head(mtcars[, c("hp", "wt", "hp_per_weight", "efficiency_class")]) ``` We calculated horsepower-to-weight ratio and classified cars by fuel efficiency for deeper analysis. ### Step 4: Group analysis using modern syntax Using pipes makes data manipulation more readable and intuitive. ```r # Analyze by cylinder groups using modern R mtcars |> group_by(cyl) |> summarise( avg_mpg = mean(mpg), avg_hp = mean(hp), count = n() ) ``` This pipeline groups cars by cylinder count and calculates average performance metrics for each group. ### Step 5: Export and save results Saving your processed data.frame preserves analysis results for future use. ```r # Save processed data write.csv(mtcars, "processed_mtcars.csv", row.names = TRUE) # Or save as R object saveRDS(mtcars, "mtcars_analysis.rds") ``` These functions export your data.frame to CSV format or save as an R object for later loading. ## Summary - Data.frames are R's primary structure for rectangular data, combining different data types in columns - Create data.frames using `data.frame()` function or load existing datasets with `data()` - Access data using `$` notation, bracket indexing, or column names for flexible data extraction - Filter and subset using logical conditions and bracket notation for targeted analysis - Use modern pipe operators `|>` with dplyr functions for readable data manipulation workflows --- ## Related Posts - [How to use mapply in R](/base-r/how-to-use-mapply-in-r.html) - [How to use read.csv in R](/base-r/how-to-use-readcsv-in-r.html) - [How to use order in R](/base-r/how-to-use-order-in-r.html) - [How to use select() in R](/dplyr/how-to-use-select-in-r.html) - [How to use mutate() in R](/dplyr/how-to-use-mutate-in-r.html)

Introduction

Getting Started

Example 1: Basic Usage

The Problem

Step 1: Create a basic data.frame

Step 2: Examine the structure

Step 3: Access specific elements

Example 2: Practical Application

The Problem

Step 1: Load and examine real data

Step 2: Filter and subset data

Step 3: Create new variables

Step 4: Group analysis using modern syntax

Step 5: Export and save results

Summary

Use modern pipe operators |> with dplyr functions for readable data manipulation workflows

Related Posts

Use modern pipe operators `|>` with dplyr functions for readable data manipulation workflows