How to Run Local LLMs in R

llm

ollama

Learn to run local language models in R using Ollama. Free, private, offline LLM access with no API costs. Use Llama 3, Mistral, and other open models.

Published

April 4, 2026

Introduction

Running LLMs locally gives you: - No API costs - completely free after setup - Privacy - data never leaves your machine - Offline access - works without internet - No rate limits - unlimited requests

Ollama makes local LLMs easy by handling model downloads, memory management, and providing a simple API. We’ll use the ellmer package to connect R to Ollama.

Prefer cloud APIs? See OpenAI or Claude for more powerful models.

Getting Started

Install Ollama

Download and install from ollama.com:

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download from ollama.com

Start Ollama

# Start the Ollama server
ollama serve

Download a model

# Download Llama 3 (8B parameters, ~4GB)
ollama pull llama3.2

# Download smaller model (faster)
ollama pull llama3.2:1b

# Download Mistral
ollama pull mistral

Using Ollama in R

With ellmer (recommended)

install.packages("ellmer")
library(ellmer)

# Connect to local Ollama
chat <- chat_ollama(model = "llama3.2")
chat$chat("What is R programming?")

With ollamar package

install.packages("ollamar")
library(ollamar)

# Generate text
generate("llama3.2", "Explain data frames in R")

# Chat format
chat(
  model = "llama3.2",
  messages = list(
    list(role = "user", content = "What is ggplot2?")
  )
)

Available Models

Recommended for R coding

Model	Size	RAM Needed	Best For
`llama3.2:1b`	1.3GB	4GB	Fast responses, simple tasks
`llama3.2`	4.7GB	8GB	Balanced, good for coding
`codellama`	3.8GB	8GB	Code generation
`mistral`	4.1GB	8GB	General tasks
`mixtral`	26GB	48GB	Best quality, needs lots of RAM

Download models

# Code-focused
ollama pull codellama

# General purpose
ollama pull llama3.2
ollama pull mistral

# Smaller/faster
ollama pull llama3.2:1b
ollama pull phi3

List installed models

library(ollamar)
list_models()

Basic Usage with ellmer

Simple chat

library(ellmer)

chat <- chat_ollama(model = "llama3.2")
chat$chat("How do I read a CSV file in R?")

With system prompt

chat <- chat_ollama(
  model = "llama3.2",
  system_prompt = "You are an R programming expert. Provide concise answers with code examples."
)

chat$chat("How do I calculate the mean by group?")

Multi-turn conversation

chat <- chat_ollama(model = "llama3.2")

chat$chat("I have a dataset with customer purchase data.")
chat$chat("How would I find the top 10 customers by total spend?")
chat$chat("Now show me how to visualize this.")

Practical Examples

Generate R code

chat <- chat_ollama(
  model = "codellama",
  system_prompt = "Return only R code. No explanations."
)

code <- chat$chat("
Write a function that:
1. Takes a data frame
2. Finds numeric columns
3. Scales them to 0-1 range
4. Returns the modified data frame
")

cat(code)

Analyze text data

classify_sentiment <- function(text) {
  chat <- chat_ollama(
    model = "llama3.2",
    system_prompt = "Classify sentiment as positive, negative, or neutral. Reply with one word only."
  )
  chat$chat(text)
}

reviews <- c(
  "This product is amazing!",
  "Terrible quality, very disappointed",
  "It works as expected"
)

sapply(reviews, classify_sentiment)

Explain code

chat <- chat_ollama(model = "llama3.2")

code <- "
mtcars |>
  filter(mpg > 20) |>
  group_by(cyl) |>
  summarise(mean_hp = mean(hp))
"

chat$chat(paste("Explain this R code:", code))

Using ollamar Directly

Generate text

library(ollamar)

result <- generate(
  model = "llama3.2",
  prompt = "Write R code to create a bar chart"
)

result$response

Chat with history

messages <- list(
  list(role = "user", content = "What is dplyr?")
)

response <- chat(model = "llama3.2", messages = messages)

# Continue conversation
messages <- c(messages, list(
  list(role = "assistant", content = response$message$content),
  list(role = "user", content = "Show me an example of filter()")
))

response2 <- chat(model = "llama3.2", messages = messages)

Embeddings for semantic search

# Generate embeddings
embedding <- embeddings(
  model = "llama3.2",
  input = "How to handle missing values in R"
)

# Use for similarity search, clustering, etc.
embedding$embedding

Performance Tips

Choose the right model size

# Fast but less capable (good for simple tasks)
chat <- chat_ollama(model = "llama3.2:1b")

# Balanced (good for most tasks)
chat <- chat_ollama(model = "llama3.2")

# Best quality (slower, needs more RAM)
chat <- chat_ollama(model = "mixtral")

Limit context length

# Shorter context = faster responses
chat <- chat_ollama(
  model = "llama3.2",
  api_args = list(num_ctx = 2048)  # Default is 4096
)

GPU acceleration

# Ollama automatically uses GPU if available
# Check GPU usage:
ollama ps

Batch Processing

library(purrr)

texts <- c("Text 1", "Text 2", "Text 3")

process_local <- function(texts, prompt_template) {
  map_chr(texts, \(text) {
    chat <- chat_ollama(model = "llama3.2")
    chat$chat(sprintf(prompt_template, text))
  })
}

summaries <- process_local(
  texts,
  "Summarize in one sentence: %s"
)

Comparing Local vs Cloud

Feature	Local (Ollama)	Cloud (OpenAI/Claude)
Cost	Free	Pay per token
Privacy	Complete	Data sent to servers
Speed	Depends on hardware	Generally fast
Quality	Good (varies by model)	Best available
Offline	Yes	No
Rate limits	None	Yes

When to use local

Privacy-sensitive data
High volume, low budget
Offline requirements
Learning/experimentation

When to use cloud

Need best quality
Don’t have GPU
Production applications
Complex reasoning tasks

Troubleshooting

Ollama not running

# Check if Ollama is running
tryCatch({
  chat <- chat_ollama(model = "llama3.2")
  chat$chat("test")
}, error = function(e) {
  message("Start Ollama with: ollama serve")
})

Model not found

# List available models
ollama list

# Pull missing model
ollama pull llama3.2

Out of memory

# Use smaller model
ollama pull llama3.2:1b

# Or reduce context
# Set num_ctx to lower value in R

Slow responses

# Use smaller model
chat <- chat_ollama(model = "llama3.2:1b")

# Reduce max tokens
chat <- chat_ollama(
  model = "llama3.2",
  api_args = list(num_predict = 100)
)

Common Mistakes

1. Forgetting to start Ollama server

# Must run this first
ollama serve

2. Using model that’s not downloaded

# Check what's installed
ollama list

# Download if needed
ollama pull llama3.2

3. Expecting cloud-level quality

# Local models are good but not as capable as GPT-4 or Claude
# Adjust expectations and prompts accordingly

4. Not providing good prompts

# Be specific with local models
# They need clearer instructions than cloud models

# Too vague
chat$chat("help with data")

# Better
chat$chat("Write R code to calculate the mean of the 'price' column in a data frame called 'sales'")

Summary

Task	Code
Start Ollama	`ollama serve` (terminal)
Download model	`ollama pull llama3.2` (terminal)
Chat with ellmer	`chat_ollama(model = "llama3.2")`
Generate text	`ollamar::generate(model, prompt)`
List models	`ollamar::list_models()`

Install Ollama from ollama.com
Download models with ollama pull
Use chat_ollama() from ellmer for easy integration
Smaller models (1b, 3b) are faster but less capable
Local LLMs are free, private, and work offline

Sources

--- title: "How to Run Local LLMs in R" description: "Learn to run local language models in R using Ollama. Free, private, offline LLM access with no API costs. Use Llama 3, Mistral, and other open models." date: 2026-04-04 categories: ['llm', 'ollama'] format: html: code-fold: false code-tools: true --- ## Introduction Running LLMs locally gives you: - **No API costs** - completely free after setup - **Privacy** - data never leaves your machine - **Offline access** - works without internet - **No rate limits** - unlimited requests Ollama makes local LLMs easy by handling model downloads, memory management, and providing a simple API. We'll use the [ellmer package](/llm/how-to-use-ellmer-in-r) to connect R to Ollama. **Prefer cloud APIs?** See [OpenAI](/llm/how-to-use-openai-api-in-r) or [Claude](/llm/how-to-use-claude-api-in-r) for more powerful models. ## Getting Started ### Install Ollama Download and install from [ollama.com](https://ollama.com): ```bash # macOS brew install ollama # Linux curl -fsSL https://ollama.com/install.sh | sh # Windows # Download from ollama.com ``` ### Start Ollama ```bash # Start the Ollama server ollama serve ``` ### Download a model ```bash # Download Llama 3 (8B parameters, ~4GB) ollama pull llama3.2 # Download smaller model (faster) ollama pull llama3.2:1b # Download Mistral ollama pull mistral ``` ## Using Ollama in R ### With ellmer (recommended) ```r install.packages("ellmer") library(ellmer) # Connect to local Ollama chat <- chat_ollama(model = "llama3.2") chat$chat("What is R programming?") ``` ### With ollamar package ```r install.packages("ollamar") library(ollamar) # Generate text generate("llama3.2", "Explain data frames in R") # Chat format chat( model = "llama3.2", messages = list( list(role = "user", content = "What is ggplot2?") ) ) ``` ## Available Models ### Recommended for R coding | Model | Size | RAM Needed | Best For | |-------|------|------------|----------| | `llama3.2:1b` | 1.3GB | 4GB | Fast responses, simple tasks | | `llama3.2` | 4.7GB | 8GB | Balanced, good for coding | | `codellama` | 3.8GB | 8GB | Code generation | | `mistral` | 4.1GB | 8GB | General tasks | | `mixtral` | 26GB | 48GB | Best quality, needs lots of RAM | ### Download models ```bash # Code-focused ollama pull codellama # General purpose ollama pull llama3.2 ollama pull mistral # Smaller/faster ollama pull llama3.2:1b ollama pull phi3 ``` ### List installed models ```r library(ollamar) list_models() ``` ## Basic Usage with ellmer ### Simple chat ```r library(ellmer) chat <- chat_ollama(model = "llama3.2") chat$chat("How do I read a CSV file in R?") ``` ### With system prompt ```r chat <- chat_ollama( model = "llama3.2", system_prompt = "You are an R programming expert. Provide concise answers with code examples." ) chat$chat("How do I calculate the mean by group?") ``` ### Multi-turn conversation ```r chat <- chat_ollama(model = "llama3.2") chat$chat("I have a dataset with customer purchase data.") chat$chat("How would I find the top 10 customers by total spend?") chat$chat("Now show me how to visualize this.") ``` ## Practical Examples ### Generate R code ```r chat <- chat_ollama( model = "codellama", system_prompt = "Return only R code. No explanations." ) code <- chat$chat(" Write a function that: 1. Takes a data frame 2. Finds numeric columns 3. Scales them to 0-1 range 4. Returns the modified data frame ") cat(code) ``` ### Analyze text data ```r classify_sentiment <- function(text) { chat <- chat_ollama( model = "llama3.2", system_prompt = "Classify sentiment as positive, negative, or neutral. Reply with one word only." ) chat$chat(text) } reviews <- c( "This product is amazing!", "Terrible quality, very disappointed", "It works as expected" ) sapply(reviews, classify_sentiment) ``` ### Explain code ```r chat <- chat_ollama(model = "llama3.2") code <- " mtcars |> filter(mpg > 20) |> group_by(cyl) |> summarise(mean_hp = mean(hp)) " chat$chat(paste("Explain this R code:", code)) ``` ## Using ollamar Directly ### Generate text ```r library(ollamar) result <- generate( model = "llama3.2", prompt = "Write R code to create a bar chart" ) result$response ``` ### Chat with history ```r messages <- list( list(role = "user", content = "What is dplyr?") ) response <- chat(model = "llama3.2", messages = messages) # Continue conversation messages <- c(messages, list( list(role = "assistant", content = response$message$content), list(role = "user", content = "Show me an example of filter()") )) response2 <- chat(model = "llama3.2", messages = messages) ``` ### Embeddings for semantic search ```r # Generate embeddings embedding <- embeddings( model = "llama3.2", input = "How to handle missing values in R" ) # Use for similarity search, clustering, etc. embedding$embedding ``` ## Performance Tips ### Choose the right model size ```r # Fast but less capable (good for simple tasks) chat <- chat_ollama(model = "llama3.2:1b") # Balanced (good for most tasks) chat <- chat_ollama(model = "llama3.2") # Best quality (slower, needs more RAM) chat <- chat_ollama(model = "mixtral") ``` ### Limit context length ```r # Shorter context = faster responses chat <- chat_ollama( model = "llama3.2", api_args = list(num_ctx = 2048) # Default is 4096 ) ``` ### GPU acceleration ```bash # Ollama automatically uses GPU if available # Check GPU usage: ollama ps ``` ## Batch Processing ```r library(purrr) texts <- c("Text 1", "Text 2", "Text 3") process_local <- function(texts, prompt_template) { map_chr(texts, \(text) { chat <- chat_ollama(model = "llama3.2") chat$chat(sprintf(prompt_template, text)) }) } summaries <- process_local( texts, "Summarize in one sentence: %s" ) ``` ## Comparing Local vs Cloud | Feature | Local (Ollama) | Cloud (OpenAI/Claude) | |---------|----------------|----------------------| | Cost | Free | Pay per token | | Privacy | Complete | Data sent to servers | | Speed | Depends on hardware | Generally fast | | Quality | Good (varies by model) | Best available | | Offline | Yes | No | | Rate limits | None | Yes | ### When to use local - Privacy-sensitive data - High volume, low budget - Offline requirements - Learning/experimentation ### When to use cloud - Need best quality - Don't have GPU - Production applications - Complex reasoning tasks ## Troubleshooting ### Ollama not running ```r # Check if Ollama is running tryCatch({ chat <- chat_ollama(model = "llama3.2") chat$chat("test") }, error = function(e) { message("Start Ollama with: ollama serve") }) ``` ### Model not found ```bash # List available models ollama list # Pull missing model ollama pull llama3.2 ``` ### Out of memory ```bash # Use smaller model ollama pull llama3.2:1b # Or reduce context # Set num_ctx to lower value in R ``` ### Slow responses ```r # Use smaller model chat <- chat_ollama(model = "llama3.2:1b") # Reduce max tokens chat <- chat_ollama( model = "llama3.2", api_args = list(num_predict = 100) ) ``` ## Common Mistakes **1. Forgetting to start Ollama server** ```bash # Must run this first ollama serve ``` **2. Using model that's not downloaded** ```bash # Check what's installed ollama list # Download if needed ollama pull llama3.2 ``` **3. Expecting cloud-level quality** ```r # Local models are good but not as capable as GPT-4 or Claude # Adjust expectations and prompts accordingly ``` **4. Not providing good prompts** ```r # Be specific with local models # They need clearer instructions than cloud models # Too vague chat$chat("help with data") # Better chat$chat("Write R code to calculate the mean of the 'price' column in a data frame called 'sales'") ``` ## Summary | Task | Code | |------|------| | Start Ollama | `ollama serve` (terminal) | | Download model | `ollama pull llama3.2` (terminal) | | Chat with ellmer | `chat_ollama(model = "llama3.2")` | | Generate text | `ollamar::generate(model, prompt)` | | List models | `ollamar::list_models()` | - Install Ollama from ollama.com - Download models with `ollama pull` - Use `chat_ollama()` from ellmer for easy integration - Smaller models (1b, 3b) are faster but less capable - Local LLMs are free, private, and work offline ## Related Posts - [How to Use ellmer in R](/llm/how-to-use-ellmer-in-r) - [How to Use OpenAI API in R](/llm/how-to-use-openai-api-in-r) - [How to Use Claude API in R](/llm/how-to-use-claude-api-in-r) - [How to Use Gemini API in R](/llm/how-to-use-gemini-api-in-r) - [How to Extract Data with LLMs in R](/llm/how-to-extract-data-with-llms-in-r) ## Sources - [Ollama Official Website](https://ollama.com) - [ellmer chat_ollama() Reference](https://ellmer.tidyverse.org/reference/chat_ollama.html) - [ollamar Package](https://cran.r-project.org/package=ollamar)