How to Run Local LLMs in R
llm
ollama
Learn to run local language models in R using Ollama. Free, private, offline LLM access with no API costs. Use Llama 3, Mistral, and other open models.
Introduction
Running LLMs locally gives you: - No API costs - completely free after setup - Privacy - data never leaves your machine - Offline access - works without internet - No rate limits - unlimited requests
Ollama makes local LLMs easy by handling model downloads, memory management, and providing a simple API. We’ll use the ellmer package to connect R to Ollama.
Prefer cloud APIs? See OpenAI or Claude for more powerful models.
Getting Started
Install Ollama
Download and install from ollama.com:
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows
# Download from ollama.comStart Ollama
# Start the Ollama server
ollama serveDownload a model
# Download Llama 3 (8B parameters, ~4GB)
ollama pull llama3.2
# Download smaller model (faster)
ollama pull llama3.2:1b
# Download Mistral
ollama pull mistralUsing Ollama in R
With ellmer (recommended)
install.packages("ellmer")
library(ellmer)
# Connect to local Ollama
chat <- chat_ollama(model = "llama3.2")
chat$chat("What is R programming?")With ollamar package
install.packages("ollamar")
library(ollamar)
# Generate text
generate("llama3.2", "Explain data frames in R")
# Chat format
chat(
model = "llama3.2",
messages = list(
list(role = "user", content = "What is ggplot2?")
)
)Available Models
Recommended for R coding
| Model | Size | RAM Needed | Best For |
|---|---|---|---|
llama3.2:1b |
1.3GB | 4GB | Fast responses, simple tasks |
llama3.2 |
4.7GB | 8GB | Balanced, good for coding |
codellama |
3.8GB | 8GB | Code generation |
mistral |
4.1GB | 8GB | General tasks |
mixtral |
26GB | 48GB | Best quality, needs lots of RAM |
Download models
# Code-focused
ollama pull codellama
# General purpose
ollama pull llama3.2
ollama pull mistral
# Smaller/faster
ollama pull llama3.2:1b
ollama pull phi3List installed models
library(ollamar)
list_models()Basic Usage with ellmer
Simple chat
library(ellmer)
chat <- chat_ollama(model = "llama3.2")
chat$chat("How do I read a CSV file in R?")With system prompt
chat <- chat_ollama(
model = "llama3.2",
system_prompt = "You are an R programming expert. Provide concise answers with code examples."
)
chat$chat("How do I calculate the mean by group?")Multi-turn conversation
chat <- chat_ollama(model = "llama3.2")
chat$chat("I have a dataset with customer purchase data.")
chat$chat("How would I find the top 10 customers by total spend?")
chat$chat("Now show me how to visualize this.")Practical Examples
Generate R code
chat <- chat_ollama(
model = "codellama",
system_prompt = "Return only R code. No explanations."
)
code <- chat$chat("
Write a function that:
1. Takes a data frame
2. Finds numeric columns
3. Scales them to 0-1 range
4. Returns the modified data frame
")
cat(code)Analyze text data
classify_sentiment <- function(text) {
chat <- chat_ollama(
model = "llama3.2",
system_prompt = "Classify sentiment as positive, negative, or neutral. Reply with one word only."
)
chat$chat(text)
}
reviews <- c(
"This product is amazing!",
"Terrible quality, very disappointed",
"It works as expected"
)
sapply(reviews, classify_sentiment)Explain code
chat <- chat_ollama(model = "llama3.2")
code <- "
mtcars |>
filter(mpg > 20) |>
group_by(cyl) |>
summarise(mean_hp = mean(hp))
"
chat$chat(paste("Explain this R code:", code))Using ollamar Directly
Generate text
library(ollamar)
result <- generate(
model = "llama3.2",
prompt = "Write R code to create a bar chart"
)
result$responseChat with history
messages <- list(
list(role = "user", content = "What is dplyr?")
)
response <- chat(model = "llama3.2", messages = messages)
# Continue conversation
messages <- c(messages, list(
list(role = "assistant", content = response$message$content),
list(role = "user", content = "Show me an example of filter()")
))
response2 <- chat(model = "llama3.2", messages = messages)Embeddings for semantic search
# Generate embeddings
embedding <- embeddings(
model = "llama3.2",
input = "How to handle missing values in R"
)
# Use for similarity search, clustering, etc.
embedding$embeddingPerformance Tips
Choose the right model size
# Fast but less capable (good for simple tasks)
chat <- chat_ollama(model = "llama3.2:1b")
# Balanced (good for most tasks)
chat <- chat_ollama(model = "llama3.2")
# Best quality (slower, needs more RAM)
chat <- chat_ollama(model = "mixtral")Limit context length
# Shorter context = faster responses
chat <- chat_ollama(
model = "llama3.2",
api_args = list(num_ctx = 2048) # Default is 4096
)GPU acceleration
# Ollama automatically uses GPU if available
# Check GPU usage:
ollama psBatch Processing
library(purrr)
texts <- c("Text 1", "Text 2", "Text 3")
process_local <- function(texts, prompt_template) {
map_chr(texts, \(text) {
chat <- chat_ollama(model = "llama3.2")
chat$chat(sprintf(prompt_template, text))
})
}
summaries <- process_local(
texts,
"Summarize in one sentence: %s"
)Comparing Local vs Cloud
| Feature | Local (Ollama) | Cloud (OpenAI/Claude) |
|---|---|---|
| Cost | Free | Pay per token |
| Privacy | Complete | Data sent to servers |
| Speed | Depends on hardware | Generally fast |
| Quality | Good (varies by model) | Best available |
| Offline | Yes | No |
| Rate limits | None | Yes |
When to use local
- Privacy-sensitive data
- High volume, low budget
- Offline requirements
- Learning/experimentation
When to use cloud
- Need best quality
- Don’t have GPU
- Production applications
- Complex reasoning tasks
Troubleshooting
Ollama not running
# Check if Ollama is running
tryCatch({
chat <- chat_ollama(model = "llama3.2")
chat$chat("test")
}, error = function(e) {
message("Start Ollama with: ollama serve")
})Model not found
# List available models
ollama list
# Pull missing model
ollama pull llama3.2Out of memory
# Use smaller model
ollama pull llama3.2:1b
# Or reduce context
# Set num_ctx to lower value in RSlow responses
# Use smaller model
chat <- chat_ollama(model = "llama3.2:1b")
# Reduce max tokens
chat <- chat_ollama(
model = "llama3.2",
api_args = list(num_predict = 100)
)Common Mistakes
1. Forgetting to start Ollama server
# Must run this first
ollama serve2. Using model that’s not downloaded
# Check what's installed
ollama list
# Download if needed
ollama pull llama3.23. Expecting cloud-level quality
# Local models are good but not as capable as GPT-4 or Claude
# Adjust expectations and prompts accordingly4. Not providing good prompts
# Be specific with local models
# They need clearer instructions than cloud models
# Too vague
chat$chat("help with data")
# Better
chat$chat("Write R code to calculate the mean of the 'price' column in a data frame called 'sales'")Summary
| Task | Code |
|---|---|
| Start Ollama | ollama serve (terminal) |
| Download model | ollama pull llama3.2 (terminal) |
| Chat with ellmer | chat_ollama(model = "llama3.2") |
| Generate text | ollamar::generate(model, prompt) |
| List models | ollamar::list_models() |
- Install Ollama from ollama.com
- Download models with
ollama pull - Use
chat_ollama()from ellmer for easy integration - Smaller models (1b, 3b) are faster but less capable
- Local LLMs are free, private, and work offline