list.files() in R: list files in a directory
Introduction
The list.files() function in R allows you to programmatically retrieve file names from directories on your computer. This is essential for automating data workflows, batch processing multiple files, or simply exploring what files exist in a specific location.
Getting Started
library(tidyverse)Example 1: Basic Usage
The Problem
You need to see what files exist in your current working directory and understand the basic syntax of list.files().
Step 1: List all files in current directory
Check what’s in your current working directory.
# Get all files in current directory
current_files <- list.files()
print(current_files)This returns a character vector containing all file names in your working directory.
Step 2: List files in a specific directory
Target a specific folder path instead of the current directory.
# List files in a specific directory
# Replace with an actual path on your system
folder_files <- list.files(path = "data/")
head(folder_files)The path argument lets you specify exactly which directory to examine.
Step 3: Include full file paths
Get complete file paths instead of just file names.
# Get full paths instead of just names
full_paths <- list.files(path = "data/", full.names = TRUE)
head(full_paths, 3)Setting full.names = TRUE returns complete file paths, which is useful for reading files programmatically.
Example 2: Practical Application
The Problem
You have a folder containing multiple CSV files with penguin data from different years, and you want to read all of them into R automatically. This is a common scenario when dealing with data stored across multiple files.
Step 1: Filter for specific file types
Find only CSV files in your directory using pattern matching.
# Find only CSV files
csv_files <- list.files(
path = "data/",
pattern = "\\.csv$",
full.names = TRUE
)
print(csv_files)The pattern argument uses regular expressions - \\.csv$ means “files ending with .csv”.
Step 2: Include subdirectories in search
Search recursively through all subdirectories for CSV files.
# Search recursively through subdirectories
all_csvs <- list.files(
path = "data/",
pattern = "\\.csv$",
recursive = TRUE,
full.names = TRUE
)
print(all_csvs)The recursive = TRUE option searches through all nested folders within the specified directory.
Step 3: Read multiple files automatically
Use the file list to read all CSV files into a single data frame.
# Read all CSV files and combine them
combined_data <- all_csvs |>
map_dfr(read_csv, .id = "file_source") |>
mutate(file_name = basename(all_csvs[file_source]))
glimpse(combined_data)This creates a master dataset with all files combined, including a column tracking which file each row came from.
Step 4: Filter files by date or name pattern
Find files with specific naming conventions or date patterns.
# Find files with "penguin" in the name from 2023
recent_penguin_files <- list.files(
path = "data/",
pattern = "penguin.*2023.*\\.csv$",
full.names = TRUE
)
print(recent_penguin_files)This pattern matches files containing “penguin” followed by “2023” and ending in “.csv”.
Step 5: Get additional file information
Retrieve file details like size and modification date.
# Get file info along with names
file_details <- list.files(
path = "data/",
pattern = "\\.csv$",
full.names = TRUE
) |>
map_dfr(~ data.frame(
file = .x,
size = file.size(.x),
modified = file.mtime(.x)
))
head(file_details)This creates a data frame with file names, sizes in bytes, and last modification times.
Summary
list.files()retrieves file names from directories programmatically- Use
pathto specify directory location andpatternfor file filtering with regular expressions
- Set
full.names = TRUEto get complete file paths for reading files - Enable
recursive = TRUEto search through subdirectories automatically Combine with
map_dfr()to efficiently read and merge multiple files into one dataset