How to Run Local LLMs in R

llm
ollama
Learn to run local language models in R using Ollama. Free, private, offline LLM access with no API costs. Use Llama 3, Mistral, and other open models.
Published

April 4, 2026

Introduction

Running LLMs locally gives you: - No API costs - completely free after setup - Privacy - data never leaves your machine - Offline access - works without internet - No rate limits - unlimited requests

Ollama makes local LLMs easy by handling model downloads, memory management, and providing a simple API. We’ll use the ellmer package to connect R to Ollama.

Prefer cloud APIs? See OpenAI or Claude for more powerful models.

Getting Started

Install Ollama

Download and install from ollama.com:

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download from ollama.com

Start Ollama

# Start the Ollama server
ollama serve

Download a model

# Download Llama 3 (8B parameters, ~4GB)
ollama pull llama3.2

# Download smaller model (faster)
ollama pull llama3.2:1b

# Download Mistral
ollama pull mistral

Using Ollama in R

With ollamar package

install.packages("ollamar")
library(ollamar)

# Generate text
generate("llama3.2", "Explain data frames in R")

# Chat format
chat(
  model = "llama3.2",
  messages = list(
    list(role = "user", content = "What is ggplot2?")
  )
)

Available Models

Download models

# Code-focused
ollama pull codellama

# General purpose
ollama pull llama3.2
ollama pull mistral

# Smaller/faster
ollama pull llama3.2:1b
ollama pull phi3

List installed models

library(ollamar)
list_models()

Basic Usage with ellmer

Simple chat

library(ellmer)

chat <- chat_ollama(model = "llama3.2")
chat$chat("How do I read a CSV file in R?")

With system prompt

chat <- chat_ollama(
  model = "llama3.2",
  system_prompt = "You are an R programming expert. Provide concise answers with code examples."
)

chat$chat("How do I calculate the mean by group?")

Multi-turn conversation

chat <- chat_ollama(model = "llama3.2")

chat$chat("I have a dataset with customer purchase data.")
chat$chat("How would I find the top 10 customers by total spend?")
chat$chat("Now show me how to visualize this.")

Practical Examples

Generate R code

chat <- chat_ollama(
  model = "codellama",
  system_prompt = "Return only R code. No explanations."
)

code <- chat$chat("
Write a function that:
1. Takes a data frame
2. Finds numeric columns
3. Scales them to 0-1 range
4. Returns the modified data frame
")

cat(code)

Analyze text data

classify_sentiment <- function(text) {
  chat <- chat_ollama(
    model = "llama3.2",
    system_prompt = "Classify sentiment as positive, negative, or neutral. Reply with one word only."
  )
  chat$chat(text)
}

reviews <- c(
  "This product is amazing!",
  "Terrible quality, very disappointed",
  "It works as expected"
)

sapply(reviews, classify_sentiment)

Explain code

chat <- chat_ollama(model = "llama3.2")

code <- "
mtcars |>
  filter(mpg > 20) |>
  group_by(cyl) |>
  summarise(mean_hp = mean(hp))
"

chat$chat(paste("Explain this R code:", code))

Using ollamar Directly

Generate text

library(ollamar)

result <- generate(
  model = "llama3.2",
  prompt = "Write R code to create a bar chart"
)

result$response

Chat with history

messages <- list(
  list(role = "user", content = "What is dplyr?")
)

response <- chat(model = "llama3.2", messages = messages)

# Continue conversation
messages <- c(messages, list(
  list(role = "assistant", content = response$message$content),
  list(role = "user", content = "Show me an example of filter()")
))

response2 <- chat(model = "llama3.2", messages = messages)

Performance Tips

Choose the right model size

# Fast but less capable (good for simple tasks)
chat <- chat_ollama(model = "llama3.2:1b")

# Balanced (good for most tasks)
chat <- chat_ollama(model = "llama3.2")

# Best quality (slower, needs more RAM)
chat <- chat_ollama(model = "mixtral")

Limit context length

# Shorter context = faster responses
chat <- chat_ollama(
  model = "llama3.2",
  api_args = list(num_ctx = 2048)  # Default is 4096
)

GPU acceleration

# Ollama automatically uses GPU if available
# Check GPU usage:
ollama ps

Batch Processing

library(purrr)

texts <- c("Text 1", "Text 2", "Text 3")

process_local <- function(texts, prompt_template) {
  map_chr(texts, \(text) {
    chat <- chat_ollama(model = "llama3.2")
    chat$chat(sprintf(prompt_template, text))
  })
}

summaries <- process_local(
  texts,
  "Summarize in one sentence: %s"
)

Comparing Local vs Cloud

Feature Local (Ollama) Cloud (OpenAI/Claude)
Cost Free Pay per token
Privacy Complete Data sent to servers
Speed Depends on hardware Generally fast
Quality Good (varies by model) Best available
Offline Yes No
Rate limits None Yes

When to use local

  • Privacy-sensitive data
  • High volume, low budget
  • Offline requirements
  • Learning/experimentation

When to use cloud

  • Need best quality
  • Don’t have GPU
  • Production applications
  • Complex reasoning tasks

Troubleshooting

Ollama not running

# Check if Ollama is running
tryCatch({
  chat <- chat_ollama(model = "llama3.2")
  chat$chat("test")
}, error = function(e) {
  message("Start Ollama with: ollama serve")
})

Model not found

# List available models
ollama list

# Pull missing model
ollama pull llama3.2

Out of memory

# Use smaller model
ollama pull llama3.2:1b

# Or reduce context
# Set num_ctx to lower value in R

Slow responses

# Use smaller model
chat <- chat_ollama(model = "llama3.2:1b")

# Reduce max tokens
chat <- chat_ollama(
  model = "llama3.2",
  api_args = list(num_predict = 100)
)

Common Mistakes

1. Forgetting to start Ollama server

# Must run this first
ollama serve

2. Using model that’s not downloaded

# Check what's installed
ollama list

# Download if needed
ollama pull llama3.2

3. Expecting cloud-level quality

# Local models are good but not as capable as GPT-4 or Claude
# Adjust expectations and prompts accordingly

4. Not providing good prompts

# Be specific with local models
# They need clearer instructions than cloud models

# Too vague
chat$chat("help with data")

# Better
chat$chat("Write R code to calculate the mean of the 'price' column in a data frame called 'sales'")

Summary

Task Code
Start Ollama ollama serve (terminal)
Download model ollama pull llama3.2 (terminal)
Chat with ellmer chat_ollama(model = "llama3.2")
Generate text ollamar::generate(model, prompt)
List models ollamar::list_models()
  • Install Ollama from ollama.com
  • Download models with ollama pull
  • Use chat_ollama() from ellmer for easy integration
  • Smaller models (1b, 3b) are faster but less capable
  • Local LLMs are free, private, and work offline

Sources