<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Rstats 101</title>
<link>https://rstats101.com/</link>
<atom:link href="https://rstats101.com/index.xml" rel="self" type="application/rss+xml"/>
<description>Learn R programming and statistics with practical tutorials</description>
<generator>quarto-1.8.24</generator>
<lastBuildDate>Fri, 10 Apr 2026 18:30:00 GMT</lastBuildDate>
<item>
  <title>How to Use Embeddings and Semantic Search in R</title>
  <link>https://rstats101.com/llm/how-to-use-embeddings-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Computers don’t really understand words. They understand numbers. Embeddings are a way to convert text into numbers so a computer can work with meaning, not just exact words.</p>
<p>Imagine this. You write the sentence <em>“The cat is sleeping on the couch.”</em> A model turns it into a list of numbers.</p>
<p>Every model has its own fixed vector size. A sentence fed into <code>text-embedding-3-small</code> always comes back as 1,536 numbers, and every input sentence produces <em>exactly</em> the same number of values. That fixed size is what lets you compare any two embeddings with simple math.</p>
<p>How long is that list? It depends on the model. Here are a few popular ones:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 19%">
<col style="width: 27%">
<col style="width: 33%">
<col style="width: 19%">
</colgroup>
<thead>
<tr class="header">
<th>Model</th>
<th>Provider</th>
<th>Dimensions</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>text-embedding-3-small</code></td>
<td>OpenAI</td>
<td>1,536</td>
<td>Cheap default, high quality</td>
</tr>
<tr class="even">
<td><code>text-embedding-3-large</code></td>
<td>OpenAI</td>
<td>3,072</td>
<td>Best OpenAI accuracy</td>
</tr>
<tr class="odd">
<td><code>text-embedding-004</code></td>
<td>Google Gemini</td>
<td>768</td>
<td>Free tier available</td>
</tr>
<tr class="even">
<td><code>voyage-3</code></td>
<td>Voyage AI</td>
<td>1,024</td>
<td>Strong for retrieval</td>
</tr>
<tr class="odd">
<td><code>embed-english-v3.0</code></td>
<td>Cohere</td>
<td>1,024</td>
<td>Multilingual option available</td>
</tr>
<tr class="even">
<td><code>nomic-embed-text</code></td>
<td>Ollama (local)</td>
<td>768</td>
<td>Free, runs offline</td>
</tr>
<tr class="odd">
<td><code>mxbai-embed-large</code></td>
<td>Ollama (local)</td>
<td>1,024</td>
<td>Higher quality local model</td>
</tr>
<tr class="even">
<td><code>all-MiniLM-L6-v2</code></td>
<td>Sentence Transformers</td>
<td>384</td>
<td>Tiny, fast, surprisingly good</td>
</tr>
</tbody>
</table>
<p>Now you write <em>“A kitten is resting on the sofa.”</em> Even though the words are different, the meaning is very similar, so the lists of numbers will also be very similar.</p>
<p>But if you compare it with <em>“Stock markets fell sharply today,”</em> the numbers will be very different, because the meaning is unrelated.</p>
<p>This simple idea is powerful. Instead of matching exact keywords, you can match meaning.</p>
<p>For example:</p>
<ul>
<li>If you search for <em>“comfortable place to sit”</em>, you can find results like <em>“sofa”</em> or <em>“armchair”</em></li>
<li>You can detect that <em>“cheap flights”</em> and <em>“low-cost airfare”</em> mean the same thing</li>
<li>You can group documents about <em>“diabetes”</em>, <em>“insulin”</em>, and <em>“blood sugar”</em> together automatically</li>
</ul>
<p>In the past, doing this required complicated rules or custom models. With embeddings, you just compare how “close” two pieces of text are.</p>
<p>In this tutorial, you’ll learn how to:</p>
<ul>
<li>Turn text into embeddings using R and the OpenAI API</li>
<li>Measure how similar two pieces of text are</li>
<li>Build a simple semantic search system for your own documents</li>
</ul>
<p><strong>What you can build with embeddings:</strong></p>
<ul>
<li>Search a knowledge base by meaning, not keywords</li>
<li>Find duplicate or near-duplicate questions</li>
<li>Cluster documents by topic</li>
<li>Recommend related articles</li>
<li>Power the retrieval step in RAG systems</li>
</ul>
</section>
<section id="how-embeddings-work" class="level2">
<h2 class="anchored" data-anchor-id="how-embeddings-work">How Embeddings Work</h2>
<p>A text embedding is a vector of floating-point numbers (often 768, 1024, or 1536 dimensions). The model is trained so that semantically similar text lands near each other in this vector space.</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Approach</th>
<th>Matches On</th>
<th>Handles Synonyms?</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Keyword search</td>
<td>Exact words</td>
<td>No</td>
</tr>
<tr class="even">
<td>Embedding search</td>
<td>Meaning</td>
<td>Yes</td>
</tr>
</tbody>
</table>
<p>For example, “How do I reset my password?” and “I forgot my login credentials” share almost no words but produce very similar embeddings.</p>
<section id="the-famous-kingqueen-example" class="level3">
<h3 class="anchored" data-anchor-id="the-famous-kingqueen-example">The Famous King–Queen Example</h3>
<p>The classic intuition for embeddings comes from the original word2vec paper. When you embed individual words, related words cluster together in predictable ways:</p>
<ul>
<li><strong>king</strong> is close to <strong>queen</strong></li>
<li><strong>man</strong> is close to <strong>woman</strong></li>
<li><strong>Paris</strong> is close to <strong>France</strong></li>
<li><strong>dog</strong> is close to <strong>puppy</strong></li>
</ul>
<p>But the surprising part is that the <em>directions</em> between words also carry meaning. If you take the vector for <code>king</code>, subtract the vector for <code>man</code>, and add the vector for <code>woman</code>, you land very close to the vector for <code>queen</code>:</p>
<pre><code>king − man + woman ≈ queen</code></pre>
<p>The same trick works for capitals:</p>
<pre><code>Paris − France + Italy ≈ Rome</code></pre>
<p>The model never saw an explicit rule about gender or geography. It learned these patterns from raw text alone, and the geometry of the vector space ended up encoding them. This is what people mean when they say embeddings “capture meaning” — relationships between concepts become directions in the space.</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Pair A</th>
<th>Pair B</th>
<th>Shared Direction</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>man → woman</td>
<td>king → queen</td>
<td>gender</td>
</tr>
<tr class="even">
<td>France → Paris</td>
<td>Italy → Rome</td>
<td>country → capital</td>
</tr>
<tr class="odd">
<td>walk → walked</td>
<td>run → ran</td>
<td>present → past tense</td>
</tr>
<tr class="even">
<td>big → bigger</td>
<td>small → smaller</td>
<td>comparative form</td>
</tr>
</tbody>
</table>
<p>Modern sentence-level embeddings (like the ones from OpenAI) are far richer than word2vec, but they’re built on the same core idea: meaning becomes geometry, and geometry is something a computer can work with using simple math like cosine similarity.</p>
</section>
</section>
<section id="getting-started" class="level2">
<h2 class="anchored" data-anchor-id="getting-started">Getting Started</h2>
<p>You’ll need an API key for an embeddings provider. OpenAI is the most common choice; we’ll use it here.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(httr2)</span>
<span id="cb3-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb3-3"></span>
<span id="cb3-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.setenv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">OPENAI_API_KEY =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"your-key-here"</span>)</span></code></pre></div></div>
<p>See <a href="../llm/how-to-use-openai-api-in-r">How to Use the OpenAI API in R</a> for setup details.</p>
</section>
<section id="get-a-single-embedding" class="level2">
<h2 class="anchored" data-anchor-id="get-a-single-embedding">Get a Single Embedding</h2>
<p>The OpenAI embeddings endpoint takes text and returns a numeric vector. Here’s a small helper:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">embed_text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text-embedding-3-small"</span>) {</span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">request</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://api.openai.com/v1/embeddings"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_auth_bearer_token</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.getenv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"OPENAI_API_KEY"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_body_json</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">input =</span> text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> model)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_perform</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">resp_body_json</span>()</span>
<span id="cb4-7">}</span></code></pre></div></div>
<p>This wraps a single POST request. The <code>text-embedding-3-small</code> model is cheap and fast — a good default.</p>
<section id="call-it-on-one-string" class="level3">
<h3 class="anchored" data-anchor-id="call-it-on-one-string">Call it on one string</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">embed_text</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"R is a language for statistical computing"</span>)</span>
<span id="cb5-2"></span>
<span id="cb5-3">vec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>data[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>embedding)</span>
<span id="cb5-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(vec)</span>
<span id="cb5-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [1] 1536</span></span></code></pre></div></div>
<p>The returned vector has 1536 numbers. Let’s peek at the first ten to see what they actually look like:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(vec, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb6-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [1] -0.01823  0.04517 -0.00192  0.03104 -0.02276</span></span>
<span id="cb6-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [6]  0.01438  0.00067 -0.02651  0.05209 -0.01102</span></span></code></pre></div></div>
<p>Each number is a small floating-point value, typically between roughly -0.1 and 0.1. Individually these numbers mean nothing to a human — you can’t point at the third coordinate and say “that’s the ‘statistics’ dimension.” The meaning lives in the <em>pattern of all 1536 values taken together</em>, and you surface that meaning by comparing vectors to each other.</p>
<p>A quick summary of the distribution confirms this:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(vec)</span>
<span id="cb7-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.</span></span>
<span id="cb7-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># -0.10382  -0.01891   0.00024  -0.00006   0.01923   0.11047</span></span></code></pre></div></div>
<p>The values are centered near zero and spread symmetrically — a typical normalized embedding. You don’t read them directly; you compare them to other vectors.</p>
</section>
</section>
<section id="cosine-similarity-comparing-two-embeddings" class="level2">
<h2 class="anchored" data-anchor-id="cosine-similarity-comparing-two-embeddings">Cosine Similarity: Comparing Two Embeddings</h2>
<p>The standard way to compare embeddings is <strong>cosine similarity</strong> — the cosine of the angle between two vectors. Values range from -1 (opposite) to 1 (identical direction).</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">cosine_sim <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(a, b) {</span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(a <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> b) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(a<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(b<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)))</span>
<span id="cb8-3">}</span></code></pre></div></div>
<p>Cosine similarity ignores vector length and only cares about direction, which is what you want when comparing meaning.</p>
<section id="compare-two-sentences" class="level3">
<h3 class="anchored" data-anchor-id="compare-two-sentences">Compare two sentences</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">a <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">embed_text</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How do I reset my password?"</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>data[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>embedding)</span>
<span id="cb9-2">b <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">embed_text</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I forgot my login"</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>data[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>embedding)</span>
<span id="cb9-3">c <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">embed_text</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What's the weather today?"</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>data[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>embedding)</span>
<span id="cb9-4"></span>
<span id="cb9-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cosine_sim</span>(a, b)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ~0.78 (similar meaning)</span></span>
<span id="cb9-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cosine_sim</span>(a, c)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ~0.12 (unrelated)</span></span></code></pre></div></div>
<p>The first pair scores high despite sharing no words. That’s the embedding magic.</p>
</section>
</section>
<section id="embed-many-documents-at-once" class="level2">
<h2 class="anchored" data-anchor-id="embed-many-documents-at-once">Embed Many Documents at Once</h2>
<p>The OpenAI API accepts a list of inputs in a single call. Batching is much cheaper than one call per document.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">embed_many <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(texts, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text-embedding-3-small"</span>) {</span>
<span id="cb10-2">  result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">request</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://api.openai.com/v1/embeddings"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_auth_bearer_token</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.getenv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"OPENAI_API_KEY"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_body_json</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">input =</span> texts, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> model)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_perform</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">resp_body_json</span>()</span>
<span id="cb10-7"></span>
<span id="cb10-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>data, \(d) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(d<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>embedding))</span>
<span id="cb10-9">}</span></code></pre></div></div>
<section id="build-an-embedding-store" class="level3">
<h3 class="anchored" data-anchor-id="build-an-embedding-store">Build an embedding store</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">docs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb11-2">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"R is a language for statistical computing"</span>,</span>
<span id="cb11-3">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Python is popular for data science"</span>,</span>
<span id="cb11-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ggplot2 makes beautiful charts"</span>,</span>
<span id="cb11-5">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Pizza is a popular Italian dish"</span>,</span>
<span id="cb11-6">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Use dplyr to filter and group data"</span></span>
<span id="cb11-7">)</span>
<span id="cb11-8"></span>
<span id="cb11-9">embeddings <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">embed_many</span>(docs)</span>
<span id="cb11-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(embeddings)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 5</span></span></code></pre></div></div>
<p>You now have one vector per document. Store them in a tibble alongside the original text.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">store <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb12-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> docs,</span>
<span id="cb12-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">embedding =</span> embeddings</span>
<span id="cb12-4">)</span>
<span id="cb12-5"></span>
<span id="cb12-6">store</span>
<span id="cb12-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># # A tibble: 5 × 2</span></span>
<span id="cb12-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   text                                      embedding</span></span>
<span id="cb12-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   &lt;chr&gt;                                     &lt;list&gt;</span></span>
<span id="cb12-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 1 R is a language for statistical computing &lt;dbl [1,536]&gt;</span></span>
<span id="cb12-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2 Python is popular for data science        &lt;dbl [1,536]&gt;</span></span>
<span id="cb12-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 3 ggplot2 makes beautiful charts            &lt;dbl [1,536]&gt;</span></span>
<span id="cb12-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 4 Pizza is a popular Italian dish           &lt;dbl [1,536]&gt;</span></span>
<span id="cb12-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 5 Use dplyr to filter and group data        &lt;dbl [1,536]&gt;</span></span></code></pre></div></div>
<p>Notice the <code>embedding</code> column is a <strong>list-column</strong> — each row holds an entire 1,536-element numeric vector. This is one of the most useful patterns in the tidyverse: keep complex objects inside a tibble so you can treat them as ordinary rows. If list-columns are new to you, see <a href="../tidyr/how-to-use-nest-in-r">How to Use nest() in R</a> for the broader pattern.</p>
<p>You can peek at the first few values of any row with <code>pluck()</code>:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">store<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>embedding[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb13-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [1] -0.01823  0.04517 -0.00192  0.03104 -0.02276</span></span></code></pre></div></div>
<p>Keeping the text and embedding side-by-side makes search results easy to display.</p>
</section>
</section>
<section id="visualizing-semantic-similarity" class="level2">
<h2 class="anchored" data-anchor-id="visualizing-semantic-similarity">Visualizing Semantic Similarity</h2>
<p>Numbers are abstract — plots make the geometry of embeddings concrete. Two visualizations are especially useful: a <strong>similarity heatmap</strong> showing how every document relates to every other, and a <strong>2D projection</strong> placing the documents on a plane so you can literally see the clusters.</p>
<section id="build-a-similarity-matrix" class="level3">
<h3 class="anchored" data-anchor-id="build-a-similarity-matrix">Build a Similarity Matrix</h3>
<p>First, compute cosine similarity between every pair of documents in the store. The result is a square matrix where row <code>i</code>, column <code>j</code> is the similarity between document <code>i</code> and document <code>j</code>.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">similarity_matrix <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(embeddings) {</span>
<span id="cb14-2">  n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(embeddings)</span>
<span id="cb14-3">  m <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">matrix</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, n, n)</span>
<span id="cb14-4">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (i <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq_len</span>(n)) {</span>
<span id="cb14-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (j <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq_len</span>(n)) {</span>
<span id="cb14-6">      m[i, j] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cosine_sim</span>(embeddings[[i]], embeddings[[j]])</span>
<span id="cb14-7">    }</span>
<span id="cb14-8">  }</span>
<span id="cb14-9">  m</span>
<span id="cb14-10">}</span>
<span id="cb14-11"></span>
<span id="cb14-12">sim <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">similarity_matrix</span>(embeddings)</span>
<span id="cb14-13"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(sim, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb14-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#      [,1] [,2] [,3] [,4] [,5]</span></span>
<span id="cb14-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [1,] 1.00 0.55 0.45 0.10 0.50</span></span>
<span id="cb14-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [2,] 0.55 1.00 0.30 0.08 0.40</span></span>
<span id="cb14-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [3,] 0.45 0.30 1.00 0.05 0.45</span></span>
<span id="cb14-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [4,] 0.10 0.08 0.05 1.00 0.08</span></span>
<span id="cb14-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [5,] 0.50 0.40 0.45 0.08 1.00</span></span></code></pre></div></div>
<p>The diagonal is always 1 (every document is identical to itself). Off-diagonal cells reveal which documents the model thinks are related. Row 4 (the pizza sentence) stands out immediately — every off-diagonal value is below 0.11, confirming it’s unrelated to the other four. You can already see the story in the raw numbers; the heatmap just makes it visual.</p>
<p>For easier reading, label the rows and columns with short names and turn the matrix into a tibble:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">short <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"R"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Python"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ggplot2"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Pizza"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dplyr"</span>)</span>
<span id="cb15-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rownames</span>(sim) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> short</span>
<span id="cb15-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colnames</span>(sim) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> short</span>
<span id="cb15-4"></span>
<span id="cb15-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_tibble</span>(sim, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rownames =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"doc"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.numeric), \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(x, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)))</span>
<span id="cb15-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># # A tibble: 5 × 6</span></span>
<span id="cb15-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   doc        R Python ggplot2 Pizza dplyr</span></span>
<span id="cb15-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#   &lt;chr&gt;  &lt;dbl&gt;  &lt;dbl&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;</span></span>
<span id="cb15-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 1 R       1     0.55    0.45  0.1    0.5</span></span>
<span id="cb15-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 2 Python  0.55  1       0.3   0.08   0.4</span></span>
<span id="cb15-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 3 ggplot2 0.45  0.3     1     0.05   0.45</span></span>
<span id="cb15-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 4 Pizza   0.1   0.08    0.05  1      0.08</span></span>
<span id="cb15-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 5 dplyr   0.5   0.4     0.45  0.08   1</span></span></code></pre></div></div>
</section>
<section id="plot-a-heatmap-with-ggplot" class="level3">
<h3 class="anchored" data-anchor-id="plot-a-heatmap-with-ggplot">Plot a Heatmap with ggplot</h3>
<p>To plot it with ggplot, convert the matrix to a long-format data frame with one row per pair. This uses <a href="../tidyr/how-to-use-pivotlonger-in-r">pivot_longer()</a> to reshape the matrix:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">sim_long <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_tibble</span>(sim) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_names</span>(docs) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">doc_a =</span> docs) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>doc_a, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"doc_b"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"similarity"</span>)</span></code></pre></div></div>
<p>Now draw the heatmap with <a href="../ggplot2/how-to-use-geomtile-in-r">geom_tile()</a>, which is the standard ggplot2 layer for matrix-style visualizations:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(sim_long, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(doc_a, doc_b, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> similarity)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_tile</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(similarity, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_fill_gradient2</span>(</span>
<span id="cb17-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">low =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mid =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lightblue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">high =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"steelblue"</span>,</span>
<span id="cb17-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">midpoint =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">limits =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb17-7">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb17-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Semantic Similarity Between Documents"</span>,</span>
<span id="cb17-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cosine"</span></span>
<span id="cb17-11">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">angle =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">35</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hjust =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span></code></pre></div></div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://rstats101.com/images/llm/text-embeddings-similarity-heatmap-in-r-ggplot.png" class="img-fluid figure-img"></p>
<figcaption>Heatmap of cosine similarity between text embeddings in R using ggplot2, showing data-science sentences clustering together and the pizza sentence as an outlier</figcaption>
</figure>
</div>
<p><strong>What you’ll see:</strong> The two R-related sentences (“R is a language…” and “Use dplyr…”) form a bright cell, “ggplot2 makes beautiful charts” lights up against both R sentences, “Python is popular for data science” sits closer to the R sentences than to the pizza sentence, and “Pizza is a popular Italian dish” stays a pale outlier connected to nothing. The heatmap <em>visually proves</em> the model has grouped the data-science sentences together — without ever being told what data science is.</p>
</section>
<section id="plot-a-2d-projection" class="level3">
<h3 class="anchored" data-anchor-id="plot-a-2d-projection">Plot a 2D Projection</h3>
<p>A heatmap shows pairwise relationships, but a <a href="../ggplot2/how-to-use-geompoint-in-r">scatter plot</a> shows the underlying geometry. Embeddings live in hundreds of dimensions, so we use <strong>PCA</strong> to squash them into two dimensions for plotting.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">emb_matrix <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">do.call</span>(rbind, embeddings)</span>
<span id="cb18-2"></span>
<span id="cb18-3">pca <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prcomp</span>(emb_matrix, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">center =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scale. =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb18-4"></span>
<span id="cb18-5">points <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb18-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> docs,</span>
<span id="cb18-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> pca<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>x[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb18-8">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> pca<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>x[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span>
<span id="cb18-9">)</span></code></pre></div></div>
<p><code>prcomp()</code> finds the two directions in the embedding space that explain the most variance. The first two principal components are usually enough to reveal clusters.</p>
</section>
<section id="plot-the-projection" class="level3">
<h3 class="anchored" data-anchor-id="plot-the-projection">Plot the projection</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(points, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(x, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> text)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"steelblue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nudge_y =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb19-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Documents Projected to 2D with PCA"</span>,</span>
<span id="cb19-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PC1"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PC2"</span></span>
<span id="cb19-7">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb19-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expand_limits</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>))</span></code></pre></div></div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://rstats101.com/images/llm/text-embeddings-pca-2d-projection-in-r-ggplot.png" class="img-fluid figure-img"></p>
<figcaption>PCA 2D projection of text embeddings in R with ggplot2, showing four data-science documents clustered together and the pizza document isolated on the right</figcaption>
</figure>
</div>
<p><strong>What you’ll see:</strong> the four data-science sentences cluster together on one side of the plot, while the pizza sentence sits alone on the other side. The model has effectively drawn a line between “things about data” and “things about food” — based purely on the geometry of the embeddings.</p>
</section>
<section id="visualize-query-distance" class="level3">
<h3 class="anchored" data-anchor-id="visualize-query-distance">Visualize Query Distance</h3>
<p>You can also use the same plot to show how close a search query is to each document. Embed the query, add a <code>score</code> column with <a href="../dplyr/how-to-use-mutate-in-r">mutate()</a> and <a href="../purrr/how-to-use-map-in-r">map_dbl()</a>, then color the points by that score:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">query <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"how do I plot data in R?"</span></span>
<span id="cb20-2">q_vec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">embed_text</span>(query)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>data[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>embedding)</span>
<span id="cb20-3"></span>
<span id="cb20-4">points <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb20-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">score =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(embeddings, \(e) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cosine_sim</span>(q_vec, e))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb20-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(x, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> score, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> score)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nudge_y =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_gradient</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">low =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"grey80"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">high =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"darkred"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Similarity to query: '"</span>, query, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"'"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb20-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div></div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://rstats101.com/images/llm/semantic-search-query-distance-plot-in-r-ggplot.png" class="img-fluid figure-img"></p>
<figcaption>Semantic search query distance plot in R with ggplot2, showing documents sized and colored by cosine similarity to a user query about plotting data</figcaption>
</figure>
</div>
<p>The points closest to the query (in meaning) will be the largest and darkest. This is what semantic search actually <em>looks</em> like under the hood.</p>
</section>
</section>
<section id="anti-correlation-when-sentences-disagree" class="level2">
<h2 class="anchored" data-anchor-id="anti-correlation-when-sentences-disagree">Anti-Correlation: When Sentences Disagree</h2>
<p>So far we’ve focused on documents that <em>should</em> land close together. Equally important is seeing what happens when sentences are unrelated — the low-similarity end of the scale. This is what makes semantic search trustworthy: irrelevant results need to be visibly far away, not just slightly less close.</p>
<section id="a-mixed-set-of-sentences" class="level3">
<h3 class="anchored" data-anchor-id="a-mixed-set-of-sentences">A Mixed Set of Sentences</h3>
<p>Build a small set that deliberately mixes related topics and totally unrelated ones:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">mixed <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb21-2">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I love this product, it works perfectly"</span>,</span>
<span id="cb21-3">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"This is the best thing I have ever bought"</span>,</span>
<span id="cb21-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I hate this product, it broke immediately"</span>,</span>
<span id="cb21-5">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Worst purchase I have ever made"</span>,</span>
<span id="cb21-6">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Quantum chromodynamics describes the strong force"</span>,</span>
<span id="cb21-7">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The Higgs boson was discovered in 2012"</span></span>
<span id="cb21-8">)</span>
<span id="cb21-9"></span>
<span id="cb21-10">mixed_emb <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">embed_many</span>(mixed)</span>
<span id="cb21-11">mixed_sim <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">similarity_matrix</span>(mixed_emb)</span></code></pre></div></div>
<p>Three groups: positive reviews (1–2), negative reviews (3–4), and physics statements (5–6). We expect physics to be far from everything else, and reviews to cluster — but with an interesting twist around antonyms.</p>
</section>
<section id="plot-the-mixed-heatmap" class="level3">
<h3 class="anchored" data-anchor-id="plot-the-mixed-heatmap">Plot the Mixed Heatmap</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">labels <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"love it"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"best ever"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"hate it"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"worst ever"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"QCD"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Higgs"</span>)</span>
<span id="cb22-2"></span>
<span id="cb22-3">mixed_long <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as_tibble</span>(mixed_sim) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_names</span>(labels) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">doc_a =</span> labels) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>doc_a, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"doc_b"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"similarity"</span>)</span>
<span id="cb22-7"></span>
<span id="cb22-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(mixed_long, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(doc_a, doc_b, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> similarity)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_tile</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(similarity, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_fill_gradient2</span>(</span>
<span id="cb22-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">low =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"firebrick"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mid =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">high =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"steelblue"</span>,</span>
<span id="cb22-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">midpoint =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">limits =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb22-14">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb22-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Correlation and Anti-Correlation"</span>,</span>
<span id="cb22-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Reviews cluster together; physics is far from both"</span>,</span>
<span id="cb22-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cosine"</span></span>
<span id="cb22-19">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-21">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">axis.text.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">angle =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">35</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hjust =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span></code></pre></div></div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://rstats101.com/images/llm/text-embeddings-anti-correlation-heatmap-in-r-ggplot.png" class="img-fluid figure-img"></p>
<figcaption>Heatmap showing correlation and anti-correlation between text embeddings in R, with product reviews forming one block and physics statements forming a separate block with pale across-block cells</figcaption>
</figure>
</div>
<p><strong>What you’ll see:</strong> Two bright blocks form along the diagonal — one for the review pair, one for the physics pair. Between them, the cells turn pale: reviews vs physics scores around 0.05–0.15, the lowest values in the matrix. That low score is what “anti-correlation” looks like in practice for embeddings — not negative numbers, but a clear visual gap.</p>
</section>
<section id="the-surprising-antonym-result" class="level3">
<h3 class="anchored" data-anchor-id="the-surprising-antonym-result">The Surprising Antonym Result</h3>
<p>Run this query and watch what happens:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cosine_sim</span>(mixed_emb[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]], mixed_emb[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]])</span>
<span id="cb23-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ~0.72  ("love it" vs "hate it")</span></span>
<span id="cb23-3"></span>
<span id="cb23-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cosine_sim</span>(mixed_emb[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]], mixed_emb[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>]])</span>
<span id="cb23-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ~0.08  ("love it" vs "QCD")</span></span></code></pre></div></div>
<p>“I love this product” and “I hate this product” score <em>higher</em> than “I love this product” and a physics sentence. That’s surprising at first — but it makes sense: both reviews share grammar, vocabulary, the word “product”, and the entire context of a customer opinion. The embedding captures <em>topic</em> much more strongly than <em>sentiment</em>.</p>
<p>This is a critical lesson:</p>
<blockquote class="blockquote">
<p><strong>Embedding similarity measures topical relatedness, not agreement.</strong> Two sentences that contradict each other will often score high if they’re about the same thing.</p>
</blockquote>
<p>If you need to detect agreement vs disagreement (e.g., for fact-checking or stance detection), embeddings alone won’t do it — you need an LLM with a classification prompt. See <a href="../llm/how-to-classify-text-with-llms-in-r">How to Classify Text with LLMs in R</a> for that approach.</p>
</section>
<section id="visualize-the-three-clusters" class="level3">
<h3 class="anchored" data-anchor-id="visualize-the-three-clusters">Visualize the Three Clusters</h3>
<p>The PCA scatter plot makes the topic vs sentiment distinction obvious:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1">mixed_pca <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prcomp</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">do.call</span>(rbind, mixed_emb), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">center =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb24-2"></span>
<span id="cb24-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb24-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> labels,</span>
<span id="cb24-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"review"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"review"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"review"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"review"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"physics"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"physics"</span>),</span>
<span id="cb24-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> mixed_pca<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>x[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb24-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> mixed_pca<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>x[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span>
<span id="cb24-8">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb24-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(x, y, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> group, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> text)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb24-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb24-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nudge_y =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"black"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb24-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">review =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"steelblue"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">physics =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"firebrick"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb24-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb24-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Topic Beats Sentiment in Embedding Space"</span>,</span>
<span id="cb24-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Positive and negative reviews cluster together; physics sits far away"</span>,</span>
<span id="cb24-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PC1"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PC2"</span></span>
<span id="cb24-17">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb24-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div></div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://rstats101.com/images/llm/text-embeddings-topic-vs-sentiment-pca-in-r-ggplot.png" class="img-fluid figure-img"></p>
<figcaption>PCA scatter plot of text embeddings in R with ggplot2 showing topic dominates sentiment, with positive and negative product reviews clustered together and physics statements far away</figcaption>
</figure>
</div>
<p><strong>What you’ll see:</strong> four reviews forming one tight blue cluster (positive <em>and</em> negative mixed together) and two physics sentences forming a separate red cluster on the other side of the plot. The reviews aren’t separated by sentiment — they’re unified by topic. That single picture tells you exactly what embeddings are good at and what they’re not.</p>
</section>
</section>
<section id="build-a-semantic-search-function" class="level2">
<h2 class="anchored" data-anchor-id="build-a-semantic-search-function">Build a Semantic Search Function</h2>
<p>Searching means: embed the query, compare it to every stored embedding, return the top matches.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1">semantic_search <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(query, store, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">top_n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) {</span>
<span id="cb25-2">  q_vec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">embed_text</span>(query)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>data[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>embedding)</span>
<span id="cb25-3"></span>
<span id="cb25-4">  store <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb25-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">score =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(embedding, \(e) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cosine_sim</span>(q_vec, e))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb25-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(score)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb25-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(top_n)</span>
<span id="cb25-8">}</span></code></pre></div></div>
<p>The pipeline uses <a href="../dplyr/how-to-use-mutate-in-r">mutate()</a> to add a <code>score</code> column, <a href="../dplyr/how-to-use-arrange-in-r">arrange()</a> to sort by descending similarity, and <code>head()</code> to keep the top matches. If these dplyr verbs are new to you, start with <a href="../dplyr/how-to-use-mutate-in-r">How to Use mutate() in R</a>.</p>
<section id="run-a-query" class="level3">
<h3 class="anchored" data-anchor-id="run-a-query">Run a query</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">semantic_search</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"how to make plots in R"</span>, store, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">top_n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div></div>
<table class="caption-top table">
<thead>
<tr class="header">
<th>text</th>
<th>score</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>ggplot2 makes beautiful charts</td>
<td>0.71</td>
</tr>
<tr class="even">
<td>R is a language for statistical computing</td>
<td>0.55</td>
</tr>
</tbody>
</table>
<p>Notice that the top match doesn’t contain the word “plots” — it matches on meaning.</p>
</section>
</section>
<section id="practical-example-search-faq-entries" class="level2">
<h2 class="anchored" data-anchor-id="practical-example-search-faq-entries">Practical Example: Search FAQ Entries</h2>
<p>A common real-world use is searching a help center or FAQ. Build the store once, query it many times.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1">faqs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb27-2">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"To reset your password, click 'Forgot password' on the login page"</span>,</span>
<span id="cb27-3">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You can cancel your subscription in Account Settings"</span>,</span>
<span id="cb27-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"We accept Visa, Mastercard, and PayPal"</span>,</span>
<span id="cb27-5">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Refunds are processed within 5-7 business days"</span>,</span>
<span id="cb27-6">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Two-factor authentication can be enabled in Security Settings"</span></span>
<span id="cb27-7">)</span>
<span id="cb27-8"></span>
<span id="cb27-9">faq_store <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb27-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> faqs,</span>
<span id="cb27-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">embedding =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">embed_many</span>(faqs)</span>
<span id="cb27-12">)</span></code></pre></div></div>
<section id="query-the-faq-store" class="level3">
<h3 class="anchored" data-anchor-id="query-the-faq-store">Query the FAQ store</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">semantic_search</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I can't log in"</span>, faq_store, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">top_n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb28-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "To reset your password, click 'Forgot password'..."</span></span>
<span id="cb28-3"></span>
<span id="cb28-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">semantic_search</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"how do I get my money back"</span>, faq_store, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">top_n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb28-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "Refunds are processed within 5-7 business days"</span></span></code></pre></div></div>
<p>A keyword search would miss both of these. The embedding search handles paraphrases naturally.</p>
</section>
</section>
<section id="from-search-to-rag-answer-questions-over-your-docs" class="level2">
<h2 class="anchored" data-anchor-id="from-search-to-rag-answer-questions-over-your-docs">From Search to RAG: Answer Questions Over Your Docs</h2>
<p>Semantic search returns relevant passages, but it doesn’t <em>answer</em> questions. That final step — feeding the retrieved passages into an LLM along with the user’s question — is called <strong>retrieval-augmented generation</strong>, or RAG. It’s the most common way to build an AI assistant grounded in your own documents.</p>
<p>The recipe is three steps:</p>
<ol type="1">
<li><strong>Retrieve</strong> — use semantic search to find the top-k passages related to the question</li>
<li><strong>Augment</strong> — build a prompt that includes those passages as context</li>
<li><strong>Generate</strong> — ask an LLM to answer the question using only that context</li>
</ol>
<p>You already have step 1. Steps 2 and 3 are just string assembly and one chat call.</p>
<section id="the-rag-function" class="level3">
<h3 class="anchored" data-anchor-id="the-rag-function">The RAG Function</h3>
<p>Here’s a complete RAG function built on the <code>semantic_search()</code> and <code>faq_store</code> from earlier:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span>
<span id="cb29-2"></span>
<span id="cb29-3">answer_with_rag <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(question, store, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">top_k =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) {</span>
<span id="cb29-4">  hits <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">semantic_search</span>(question, store, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">top_n =</span> top_k)</span>
<span id="cb29-5"></span>
<span id="cb29-6">  context <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(</span>
<span id="cb29-7">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Passage"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq_len</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(hits)), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">":"</span>, hits<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>text,</span>
<span id="cb29-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">collapse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb29-9">  )</span>
<span id="cb29-10"></span>
<span id="cb29-11">  prompt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(</span>
<span id="cb29-12">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Answer the question using ONLY the passages below."</span>,</span>
<span id="cb29-13">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"If the answer isn't there, say 'I don't know.'"</span>,</span>
<span id="cb29-14">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Context:</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>, context,</span>
<span id="cb29-15">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Question:"</span>, question</span>
<span id="cb29-16">  )</span>
<span id="cb29-17"></span>
<span id="cb29-18">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb29-19">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(prompt)</span>
<span id="cb29-20">}</span></code></pre></div></div>
<p>The function retrieves the <code>top_k</code> best passages, packages them as labeled context, and instructs the LLM to answer only from that context. The “say I don’t know” clause is critical — without it, the model will fall back to its own training data when the retrieval fails.</p>
</section>
<section id="run-a-rag-query" class="level3">
<h3 class="anchored" data-anchor-id="run-a-rag-query">Run a RAG Query</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">answer_with_rag</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I can't log in to my account, what should I do?"</span>, faq_store)</span>
<span id="cb30-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "To reset your password, click 'Forgot password' on the login page."</span></span>
<span id="cb30-3"></span>
<span id="cb30-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">answer_with_rag</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"what's the capital of France?"</span>, faq_store)</span>
<span id="cb30-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "I don't know."</span></span></code></pre></div></div>
<p>Notice the second response — the question is unrelated to any FAQ entry, so the model refuses rather than hallucinating. That refusal is the whole point of RAG: ground the answer in <em>your</em> data and fail loudly when the data doesn’t cover the question.</p>
</section>
<section id="why-retrieval-beats-a-huge-prompt" class="level3">
<h3 class="anchored" data-anchor-id="why-retrieval-beats-a-huge-prompt">Why Retrieval Beats a Huge Prompt</h3>
<p>You might ask: why not just paste <em>all</em> the FAQs into every prompt and skip retrieval entirely? Two reasons:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Approach</th>
<th>Cost per query</th>
<th>Scales to</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Dump everything</td>
<td>Grows with doc count</td>
<td>~50–100 short docs</td>
</tr>
<tr class="even">
<td>RAG with retrieval</td>
<td>Fixed (top-k only)</td>
<td>Millions of docs</td>
</tr>
</tbody>
</table>
<p>Retrieval keeps the prompt small, which makes responses faster, cheaper, and more accurate. LLMs also get distracted by irrelevant context, so handing them only the top matches usually <em>improves</em> answer quality.</p>
</section>
<section id="going-further-with-rag" class="level3">
<h3 class="anchored" data-anchor-id="going-further-with-rag">Going Further with RAG</h3>
<p>A few patterns worth knowing once the basic version works:</p>
<ul>
<li><strong>Chunk long documents</strong> before embedding. Split a 10-page PDF into ~300-word chunks so retrieval can find the specific paragraph that answers a question.</li>
<li><strong>Include metadata</strong> (source URL, section title) in the context so the LLM can cite where the answer came from.</li>
<li><strong>Re-rank</strong> retrieved results with a cross-encoder or a second LLM pass for higher precision at the cost of latency.</li>
<li><strong>Use a vector database</strong> (pgvector, Qdrant, LanceDB) once you have more than a few thousand documents.</li>
</ul>
<p>This is enough to build a real “chat with your docs” app in R. For a deeper dive into chat APIs, see <a href="../llm/how-to-use-ellmer-in-r">How to Use ellmer in R</a> and <a href="../llm/how-to-use-claude-api-in-r">How to Use the Claude API in R</a>.</p>
</section>
</section>
<section id="caching-embeddings" class="level2">
<h2 class="anchored" data-anchor-id="caching-embeddings">Caching Embeddings</h2>
<p>Embeddings are deterministic for a given text and model, so you should cache them. Rebuilding embeddings on every script run wastes money and time.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1">cache_path <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"embeddings_cache.rds"</span></span>
<span id="cb31-2"></span>
<span id="cb31-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.exists</span>(cache_path)) {</span>
<span id="cb31-4">  store <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_rds</span>(cache_path)</span>
<span id="cb31-5">} <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb31-6">  store <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb31-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> docs,</span>
<span id="cb31-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">embedding =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">embed_many</span>(docs)</span>
<span id="cb31-9">  )</span>
<span id="cb31-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">write_rds</span>(store, cache_path)</span>
<span id="cb31-11">}</span></code></pre></div></div>
<p>For larger projects, store embeddings in a database column or a dedicated vector store.</p>
</section>
<section id="choosing-a-model" class="level2">
<h2 class="anchored" data-anchor-id="choosing-a-model">Choosing a Model</h2>
<p>OpenAI offers several embedding models with different sizes and costs.</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Dimensions</th>
<th>Best For</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>text-embedding-3-small</code></td>
<td>1536</td>
<td>Default — cheap and good</td>
</tr>
<tr class="even">
<td><code>text-embedding-3-large</code></td>
<td>3072</td>
<td>Highest quality</td>
</tr>
<tr class="odd">
<td><code>text-embedding-ada-002</code></td>
<td>1536</td>
<td>Legacy, avoid for new work</td>
</tr>
</tbody>
</table>
<p>Start with <code>text-embedding-3-small</code>. Move to <code>large</code> only if you measure a meaningful quality gain — it costs about 6x more and produces vectors twice as large.</p>
</section>
<section id="use-local-embeddings-with-ollama" class="level2">
<h2 class="anchored" data-anchor-id="use-local-embeddings-with-ollama">Use Local Embeddings with Ollama</h2>
<p>If you don’t want to send data to a third-party API, run embeddings locally with Ollama.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">embed_local <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nomic-embed-text"</span>) {</span>
<span id="cb32-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">request</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://localhost:11434/api/embeddings"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb32-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_body_json</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> model, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prompt =</span> text)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb32-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_perform</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb32-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">resp_body_json</span>()</span>
<span id="cb32-6">}</span></code></pre></div></div>
<p>The interface is similar — only the URL and request shape change. See <a href="../llm/how-to-run-local-llms-in-r">How to Run Local LLMs in R</a> for Ollama setup.</p>
<p><strong>Note:</strong> Local embedding models are typically smaller (768 dimensions) and may be less accurate than cloud models, but they’re free and private.</p>
</section>
<section id="performance-tips" class="level2">
<h2 class="anchored" data-anchor-id="performance-tips">Performance Tips</h2>
<p><strong>Batch your requests.</strong> One call with 100 documents is dramatically faster and cheaper than 100 separate calls.</p>
<p><strong>Cache aggressively.</strong> The same input always produces the same vector, so persist them to disk or a database.</p>
<p><strong>Normalize once if you do many comparisons.</strong> If you precompute the L2 norm of each vector, cosine similarity becomes a dot product, which is faster.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1">normalize <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(v) v <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(v<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb33-2"></span>
<span id="cb33-3">store_norm <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> store <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb33-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">embedding =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(embedding, normalize))</span>
<span id="cb33-5"></span>
<span id="cb33-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Now cosine_sim simplifies to sum(a * b)</span></span></code></pre></div></div>
<p>For thousands of documents this is fine in pure R. For millions, you’ll want a vector database (pgvector, Qdrant, Pinecone).</p>
</section>
<section id="common-mistakes" class="level2">
<h2 class="anchored" data-anchor-id="common-mistakes">Common Mistakes</h2>
<p><strong>1. Comparing vectors from different models</strong></p>
<p>Embeddings are only meaningful within the same model. A vector from <code>text-embedding-3-small</code> cannot be compared to one from <code>text-embedding-3-large</code> — they live in different spaces.</p>
<p><strong>2. Forgetting to cache</strong></p>
<p>Re-embedding the same documents on every run is the most common waste of money in embedding projects. Cache by text content or hash.</p>
<p><strong>3. Treating cosine score as a probability</strong></p>
<p>A score of 0.7 doesn’t mean “70% confident.” It just means “more similar than 0.5 and less similar than 0.9.” Use it for ranking, not absolute thresholds — and if you do threshold, calibrate on your own data.</p>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Step</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Embed a single text</td>
<td><code>embed_text(text)</code></td>
</tr>
<tr class="even">
<td>Embed many texts</td>
<td><code>embed_many(texts)</code></td>
</tr>
<tr class="odd">
<td>Compare two embeddings</td>
<td><code>cosine_sim(a, b)</code></td>
</tr>
<tr class="even">
<td>Search a store</td>
<td><code>semantic_search(query, store)</code></td>
</tr>
</tbody>
</table>
<p><strong>Key points:</strong></p>
<ul>
<li>Embeddings turn text into vectors that capture meaning</li>
<li>Cosine similarity is the standard comparison metric</li>
<li>Batch your requests and cache the results</li>
<li>Use <code>text-embedding-3-small</code> as a default starting model</li>
<li>Embeddings are the foundation for semantic search and RAG</li>
</ul>
</section>
<section id="frequently-asked-questions" class="level2">
<h2 class="anchored" data-anchor-id="frequently-asked-questions">Frequently Asked Questions</h2>
<section id="how-much-do-embeddings-cost-in-r" class="level3">
<h3 class="anchored" data-anchor-id="how-much-do-embeddings-cost-in-r">How much do embeddings cost in R?</h3>
<p>OpenAI’s <code>text-embedding-3-small</code> costs about $0.02 per 1 million tokens (roughly 750,000 words). For most projects this is negligible — embedding an entire 10,000-entry FAQ typically costs less than a cent. The <code>text-embedding-3-large</code> model is ~6x more expensive but produces higher-quality vectors. For free alternatives, use <a href="../llm/how-to-run-local-llms-in-r">Ollama</a> with <code>nomic-embed-text</code>.</p>
</section>
<section id="do-i-need-a-vector-database-to-use-embeddings-in-r" class="level3">
<h3 class="anchored" data-anchor-id="do-i-need-a-vector-database-to-use-embeddings-in-r">Do I need a vector database to use embeddings in R?</h3>
<p>No — not until you have tens of thousands of documents. For small and medium projects, a tibble with an <code>embedding</code> list-column is completely fine and cosine similarity in pure R is fast enough. Reach for a vector database (pgvector, Qdrant, LanceDB, DuckDB’s <code>vss</code> extension) only when in-memory comparison becomes a bottleneck, typically past a few hundred thousand documents.</p>
</section>
<section id="can-i-use-open-source-embedding-models-in-r" class="level3">
<h3 class="anchored" data-anchor-id="can-i-use-open-source-embedding-models-in-r">Can I use open-source embedding models in R?</h3>
<p>Yes. The easiest path is <a href="../llm/how-to-run-local-llms-in-r">Ollama</a>, which serves models like <code>nomic-embed-text</code> or <code>mxbai-embed-large</code> through a local HTTP API. You can also use Hugging Face models via <code>reticulate</code> and the <code>sentence-transformers</code> Python library. Local models are free, private, and work offline — at the cost of some accuracy and setup time.</p>
</section>
<section id="how-do-i-update-embeddings-when-my-documents-change" class="level3">
<h3 class="anchored" data-anchor-id="how-do-i-update-embeddings-when-my-documents-change">How do I update embeddings when my documents change?</h3>
<p>Embeddings are deterministic for a given text + model, so you only need to re-embed documents whose <em>content</em> has changed. The simplest approach is to hash each document and store the hash alongside its embedding; if the hash changes, re-embed that row. This lets you sync a large corpus with a tiny incremental cost.</p>
</section>
<section id="how-many-dimensions-should-my-embeddings-have" class="level3">
<h3 class="anchored" data-anchor-id="how-many-dimensions-should-my-embeddings-have">How many dimensions should my embeddings have?</h3>
<p>Default to whatever the model produces — <code>text-embedding-3-small</code> gives 1536 dimensions and that’s fine for almost all use cases. OpenAI lets you truncate with a <code>dimensions</code> parameter (e.g., 512) to save storage, and the newer models are trained so that truncation preserves most of the quality. Only tune this if storage or memory is genuinely a bottleneck.</p>
</section>
<section id="are-embeddings-enough-for-sentiment-analysis" class="level3">
<h3 class="anchored" data-anchor-id="are-embeddings-enough-for-sentiment-analysis">Are embeddings enough for sentiment analysis?</h3>
<p>No — and this is a common pitfall. Embedding similarity captures <em>topic</em>, not <em>agreement</em>. Two sentences that say opposite things about the same subject often score high. For sentiment or stance detection, use an LLM with a classification prompt instead. See <a href="../llm/how-to-analyze-sentiment-with-llms-in-r">How to Analyze Sentiment with LLMs in R</a> and <a href="../llm/how-to-classify-text-with-llms-in-r">How to Classify Text with LLMs in R</a>.</p>
</section>
<section id="whats-the-difference-between-embeddings-and-fine-tuning" class="level3">
<h3 class="anchored" data-anchor-id="whats-the-difference-between-embeddings-and-fine-tuning">What’s the difference between embeddings and fine-tuning?</h3>
<p>Embeddings work with a frozen pre-trained model — you don’t change the model, you just use its output. Fine-tuning actually updates the model weights on your own data. For the vast majority of “search my docs” and “answer questions from my data” problems, embeddings + RAG are cheaper, faster, and easier to maintain than fine-tuning, while giving you the ability to update your knowledge base without retraining anything.</p>
</section>
<section id="can-i-build-a-chatbot-with-embeddings-in-r" class="level3">
<h3 class="anchored" data-anchor-id="can-i-build-a-chatbot-with-embeddings-in-r">Can I build a chatbot with embeddings in R?</h3>
<p>Yes — that’s exactly what the RAG section above shows. Combine semantic search with a chat call to <a href="../llm/how-to-use-claude-api-in-r">Claude</a> or <a href="../llm/how-to-use-openai-api-in-r">OpenAI</a>, wrap it in a <a href="https://shiny.posit.co/">Shiny</a> app, and you have a “chat with your docs” application running entirely in R.</p>
</section>
</section>
<section id="related-posts" class="level2">
<h2 class="anchored" data-anchor-id="related-posts">Related Posts</h2>
<p><strong>LLM tutorials:</strong></p>
<ul>
<li><a href="../llm/how-to-use-openai-api-in-r">How to Use the OpenAI API in R</a></li>
<li><a href="../llm/how-to-use-ellmer-in-r">How to Use ellmer in R</a></li>
<li><a href="../llm/how-to-run-local-llms-in-r">How to Run Local LLMs in R</a></li>
<li><a href="../llm/how-to-extract-data-with-llms-in-r">How to Extract Data with LLMs in R</a></li>
<li><a href="../llm/how-to-classify-text-with-llms-in-r">How to Classify Text with LLMs in R</a></li>
<li><a href="../llm/how-to-analyze-sentiment-with-llms-in-r">How to Analyze Sentiment with LLMs in R</a></li>
</ul>
<p><strong>Tidyverse building blocks used in this tutorial:</strong></p>
<ul>
<li><a href="../dplyr/how-to-use-mutate-in-r">How to Use mutate() in R</a></li>
<li><a href="../dplyr/how-to-use-arrange-in-r">How to Use arrange() in R</a></li>
<li><a href="../purrr/how-to-use-map-in-r">How to Use map() in R</a></li>
<li><a href="../tidyr/how-to-use-nest-in-r">How to Use nest() in R</a></li>
<li><a href="../tidyr/how-to-use-pivotlonger-in-r">How to Use pivot_longer() in R</a></li>
</ul>
<p><strong>ggplot2 layers used in the visualizations:</strong></p>
<ul>
<li><a href="../ggplot2/how-to-use-geomtile-in-r">How to Use geom_tile() in R</a></li>
<li><a href="../ggplot2/how-to-use-geompoint-in-r">How to Use geom_point() in R</a></li>
<li><a href="../ggplot2/how-to-use-thememinimal-in-r">How to Use theme_minimal() in R</a></li>
</ul>
</section>
<section id="sources" class="level2">
<h2 class="anchored" data-anchor-id="sources">Sources</h2>
<ul>
<li><a href="https://platform.openai.com/docs/guides/embeddings">OpenAI Embeddings Guide</a></li>
<li><a href="https://httr2.r-lib.org/">httr2 Documentation</a></li>
</ul>


<!-- -->

</section>

 ]]></description>
  <category>llm</category>
  <category>embeddings</category>
  <category>semantic search</category>
  <guid>https://rstats101.com/llm/how-to-use-embeddings-in-r.html</guid>
  <pubDate>Fri, 10 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/llm/text-embeddings-similarity-heatmap-in-r-ggplot.png" medium="image" type="image/png" height="112" width="144"/>
</item>
<item>
  <title>How to Analyze Sentiment with LLMs in R</title>
  <link>https://rstats101.com/llm/how-to-analyze-sentiment-with-llms-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Sentiment analysis determines the emotional tone of text - is it positive, negative, or neutral? LLMs make this easy without training custom models.</p>
<p><strong>Common use cases:</strong></p>
<ul>
<li>Analyze customer reviews</li>
<li>Monitor social media mentions</li>
<li>Process survey feedback</li>
<li>Track brand sentiment over time</li>
</ul>
</section>
<section id="why-use-llms-for-sentiment-analysis" class="level2">
<h2 class="anchored" data-anchor-id="why-use-llms-for-sentiment-analysis">Why Use LLMs for Sentiment Analysis?</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Approach</th>
<th>Setup</th>
<th>Nuance</th>
<th>Context Understanding</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Rule-based</td>
<td>Word lists</td>
<td>Limited</td>
<td>Poor</td>
</tr>
<tr class="even">
<td>Traditional ML</td>
<td>Training data</td>
<td>Moderate</td>
<td>Moderate</td>
</tr>
<tr class="odd">
<td>LLMs</td>
<td>Just a prompt</td>
<td>Excellent</td>
<td>Excellent</td>
</tr>
</tbody>
</table>
<p>LLMs understand sarcasm, context, and nuance that simpler methods miss.</p>
</section>
<section id="getting-started" class="level2">
<h2 class="anchored" data-anchor-id="getting-started">Getting Started</h2>
<p>Load the packages:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<p>This tutorial uses ellmer which works with <a href="../llm/how-to-use-claude-api-in-r">Claude</a>, <a href="../llm/how-to-use-openai-api-in-r">OpenAI</a>, <a href="../llm/how-to-use-gemini-api-in-r">Gemini</a>, or <a href="../llm/how-to-run-local-llms-in-r">local models</a>.</p>
</section>
<section id="basic-sentiment-analysis" class="level2">
<h2 class="anchored" data-anchor-id="basic-sentiment-analysis">Basic Sentiment Analysis</h2>
<section id="simple-three-way-classification" class="level3">
<h3 class="anchored" data-anchor-id="simple-three-way-classification">Simple three-way classification</h3>
<p>Start with positive, negative, or neutral:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb2-2"></span>
<span id="cb2-3">review <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The food was delicious but the service was incredibly slow."</span></span>
<span id="cb2-4"></span>
<span id="cb2-5">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(</span>
<span id="cb2-6">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Analyze the sentiment of this text."</span>,</span>
<span id="cb2-7">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Reply with only: positive, negative, or neutral."</span>,</span>
<span id="cb2-8">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Text:"</span>, review</span>
<span id="cb2-9">))</span>
<span id="cb2-10"></span>
<span id="cb2-11">response</span>
<span id="cb2-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "neutral"</span></span></code></pre></div></div>
<p>The review has both positive (delicious food) and negative (slow service) elements, so the overall sentiment is neutral.</p>
</section>
<section id="why-llms-handle-nuance-better" class="level3">
<h3 class="anchored" data-anchor-id="why-llms-handle-nuance-better">Why LLMs handle nuance better</h3>
<p>Traditional sentiment tools often miss context:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sarcasm - sounds positive, means negative</span></span>
<span id="cb3-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Oh great, another meeting that could have been an email."</span></span>
<span id="cb3-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rule-based: "great" = positive ❌</span></span>
<span id="cb3-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># LLM: negative ✓</span></span>
<span id="cb3-5"></span>
<span id="cb3-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Negation - negative word, positive meaning</span></span>
<span id="cb3-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Not bad at all, actually quite impressed."</span></span>
<span id="cb3-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rule-based: "bad" = negative ❌</span></span>
<span id="cb3-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># LLM: positive ✓</span></span>
<span id="cb3-10"></span>
<span id="cb3-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Mixed - needs context weighing</span></span>
<span id="cb3-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The product broke after a week, but customer service was amazing."</span></span>
<span id="cb3-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># LLM understands both aspects</span></span></code></pre></div></div>
</section>
</section>
<section id="structured-sentiment-output" class="level2">
<h2 class="anchored" data-anchor-id="structured-sentiment-output">Structured Sentiment Output</h2>
<section id="using-extract_data-for-reliable-results" class="level3">
<h3 class="anchored" data-anchor-id="using-extract_data-for-reliable-results">Using extract_data for reliable results</h3>
<p>For consistent, parseable output, use structured extraction:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">sentiment_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb4-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb4-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Overall sentiment of the text"</span></span>
<span id="cb4-4">)</span></code></pre></div></div>
<p>The <code>type_enum()</code> guarantees the response is one of the three allowed values.</p>
</section>
<section id="extract-sentiment" class="level3">
<h3 class="anchored" data-anchor-id="extract-sentiment">Extract sentiment</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb5-2"></span>
<span id="cb5-3">review <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Absolutely love this product! Best purchase I've made all year."</span></span>
<span id="cb5-4"></span>
<span id="cb5-5">sentiment <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(review, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> sentiment_type)</span>
<span id="cb5-6"></span>
<span id="cb5-7">sentiment</span>
<span id="cb5-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "positive"</span></span></code></pre></div></div>
</section>
</section>
<section id="sentiment-with-confidence-scores" class="level2">
<h2 class="anchored" data-anchor-id="sentiment-with-confidence-scores">Sentiment with Confidence Scores</h2>
<section id="when-you-need-more-than-just-a-label" class="level3">
<h3 class="anchored" data-anchor-id="when-you-need-more-than-just-a-label">When you need more than just a label</h3>
<p>Get confidence scores to identify borderline cases:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">sentiment_detailed <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb6-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentiment =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb6-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb6-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Overall sentiment"</span></span>
<span id="cb6-5">  ),</span>
<span id="cb6-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">confidence =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_number</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Confidence score from 0 to 1"</span>),</span>
<span id="cb6-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">explanation =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Brief explanation"</span>)</span>
<span id="cb6-8">)</span></code></pre></div></div>
</section>
<section id="extract-detailed-sentiment" class="level3">
<h3 class="anchored" data-anchor-id="extract-detailed-sentiment">Extract detailed sentiment</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb7-2"></span>
<span id="cb7-3">review <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"It's okay I guess. Does what it says but nothing special."</span></span>
<span id="cb7-4"></span>
<span id="cb7-5">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(review, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> sentiment_detailed)</span>
<span id="cb7-6"></span>
<span id="cb7-7">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>sentiment</span>
<span id="cb7-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "neutral"</span></span>
<span id="cb7-9"></span>
<span id="cb7-10">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>confidence</span>
<span id="cb7-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 0.85</span></span>
<span id="cb7-12"></span>
<span id="cb7-13">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>explanation</span>
<span id="cb7-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "Lukewarm endorsement with no strong positive or negative indicators"</span></span></code></pre></div></div>
<p>Low confidence (below 0.7) suggests the text is ambiguous and may need human review.</p>
</section>
</section>
<section id="fine-grained-sentiment" class="level2">
<h2 class="anchored" data-anchor-id="fine-grained-sentiment">Fine-grained Sentiment</h2>
<section id="five-point-scale" class="level3">
<h3 class="anchored" data-anchor-id="five-point-scale">Five-point scale</h3>
<p>For more granularity, use a five-point scale:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">sentiment_5point <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb8-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"very_negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"very_positive"</span>),</span>
<span id="cb8-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentiment on a 5-point scale"</span></span>
<span id="cb8-4">)</span></code></pre></div></div>
</section>
<section id="classify-on-the-scale" class="level3">
<h3 class="anchored" data-anchor-id="classify-on-the-scale">Classify on the scale</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb9-2"></span>
<span id="cb9-3">reviews <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb9-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Terrible product. Complete waste of money. Avoid!"</span>,</span>
<span id="cb9-5">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Not great. Had some issues but usable."</span>,</span>
<span id="cb9-6">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"It's fine. Does the job."</span>,</span>
<span id="cb9-7">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Really happy with this purchase!"</span>,</span>
<span id="cb9-8">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AMAZING! Exceeded all my expectations!!!"</span></span>
<span id="cb9-9">)</span>
<span id="cb9-10"></span>
<span id="cb9-11">classify_5point <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text) {</span>
<span id="cb9-12">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb9-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb9-14">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> sentiment_5point)</span>
<span id="cb9-15">}</span>
<span id="cb9-16"></span>
<span id="cb9-17">sentiments <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(reviews, classify_5point)</span>
<span id="cb9-18"></span>
<span id="cb9-19"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">review =</span> reviews, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentiment =</span> sentiments)</span></code></pre></div></div>
<table class="caption-top table">
<thead>
<tr class="header">
<th>review</th>
<th>sentiment</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Terrible product…</td>
<td>very_negative</td>
</tr>
<tr class="even">
<td>Not great…</td>
<td>negative</td>
</tr>
<tr class="odd">
<td>It’s fine…</td>
<td>neutral</td>
</tr>
<tr class="even">
<td>Really happy…</td>
<td>positive</td>
</tr>
<tr class="odd">
<td>AMAZING!…</td>
<td>very_positive</td>
</tr>
</tbody>
</table>
</section>
<section id="numeric-scores" class="level3">
<h3 class="anchored" data-anchor-id="numeric-scores">Numeric scores</h3>
<p>Convert to numbers for analysis:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">sentiment_to_score <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(sentiment) {</span>
<span id="cb10-2">  scores <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb10-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">very_negative =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb10-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">negative =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,</span>
<span id="cb10-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">neutral =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,</span>
<span id="cb10-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">positive =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,</span>
<span id="cb10-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">very_positive =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb10-8">  )</span>
<span id="cb10-9">  scores[sentiment]</span>
<span id="cb10-10">}</span>
<span id="cb10-11"></span>
<span id="cb10-12">df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb10-13">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">review =</span> reviews,</span>
<span id="cb10-14">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentiment =</span> sentiments,</span>
<span id="cb10-15">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">score =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(sentiments, sentiment_to_score)</span>
<span id="cb10-16">)</span>
<span id="cb10-17"></span>
<span id="cb10-18"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>score)</span>
<span id="cb10-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 3.0 (average sentiment)</span></span></code></pre></div></div>
</section>
</section>
<section id="aspect-based-sentiment" class="level2">
<h2 class="anchored" data-anchor-id="aspect-based-sentiment">Aspect-Based Sentiment</h2>
<section id="analyze-sentiment-by-aspect" class="level3">
<h3 class="anchored" data-anchor-id="analyze-sentiment-by-aspect">Analyze sentiment by aspect</h3>
<p>Products often have multiple aspects with different sentiments:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">aspect_sentiment <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb11-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">overall =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb11-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb11-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Overall sentiment"</span></span>
<span id="cb11-5">  ),</span>
<span id="cb11-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aspects =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_array</span>(</span>
<span id="cb11-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">items =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb11-8">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aspect =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The aspect being discussed"</span>),</span>
<span id="cb11-9">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentiment =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb11-10">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb11-11">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentiment for this aspect"</span></span>
<span id="cb11-12">      )</span>
<span id="cb11-13">    ),</span>
<span id="cb11-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Individual aspects mentioned"</span></span>
<span id="cb11-15">  )</span>
<span id="cb11-16">)</span></code></pre></div></div>
</section>
<section id="extract-aspect-sentiments" class="level3">
<h3 class="anchored" data-anchor-id="extract-aspect-sentiments">Extract aspect sentiments</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb12-2"></span>
<span id="cb12-3">review <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The laptop is incredibly fast and the screen is beautiful.</span></span>
<span id="cb12-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">However, the battery life is disappointing and the keyboard feels cheap."</span></span>
<span id="cb12-5"></span>
<span id="cb12-6">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(review, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> aspect_sentiment)</span>
<span id="cb12-7"></span>
<span id="cb12-8">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>overall</span>
<span id="cb12-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "neutral"</span></span>
<span id="cb12-10"></span>
<span id="cb12-11">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>aspects</span>
<span id="cb12-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [[1]] aspect: "speed", sentiment: "positive"</span></span>
<span id="cb12-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [[2]] aspect: "screen", sentiment: "positive"</span></span>
<span id="cb12-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [[3]] aspect: "battery", sentiment: "negative"</span></span>
<span id="cb12-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [[4]] aspect: "keyboard", sentiment: "negative"</span></span></code></pre></div></div>
</section>
<section id="convert-to-data-frame" class="level3">
<h3 class="anchored" data-anchor-id="convert-to-data-frame">Convert to data frame</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">aspects_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb13-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">aspect =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>aspects, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"aspect"</span>),</span>
<span id="cb13-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentiment =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>aspects, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sentiment"</span>)</span>
<span id="cb13-4">)</span>
<span id="cb13-5"></span>
<span id="cb13-6">aspects_df</span></code></pre></div></div>
<table class="caption-top table">
<thead>
<tr class="header">
<th>aspect</th>
<th>sentiment</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>speed</td>
<td>positive</td>
</tr>
<tr class="even">
<td>screen</td>
<td>positive</td>
</tr>
<tr class="odd">
<td>battery</td>
<td>negative</td>
</tr>
<tr class="even">
<td>keyboard</td>
<td>negative</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="batch-sentiment-analysis" class="level2">
<h2 class="anchored" data-anchor-id="batch-sentiment-analysis">Batch Sentiment Analysis</h2>
<section id="process-multiple-reviews" class="level3">
<h3 class="anchored" data-anchor-id="process-multiple-reviews">Process multiple reviews</h3>
<p>Create a reusable function:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">analyze_sentiment <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text) {</span>
<span id="cb14-2">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb14-3"></span>
<span id="cb14-4">  sentiment_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb14-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb14-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentiment"</span></span>
<span id="cb14-7">  )</span>
<span id="cb14-8"></span>
<span id="cb14-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rate limiting</span></span>
<span id="cb14-10">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> sentiment_type)</span>
<span id="cb14-11">}</span></code></pre></div></div>
</section>
<section id="apply-to-a-data-frame" class="level3">
<h3 class="anchored" data-anchor-id="apply-to-a-data-frame">Apply to a data frame</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">reviews_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb15-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,</span>
<span id="cb15-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">review =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb15-4">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Love it! Works perfectly."</span>,</span>
<span id="cb15-5">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Broken on arrival. Very disappointed."</span>,</span>
<span id="cb15-6">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Does what it says. Nothing more."</span>,</span>
<span id="cb15-7">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Best purchase ever! Highly recommend!"</span>,</span>
<span id="cb15-8">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Meh. Expected better for the price."</span></span>
<span id="cb15-9">  )</span>
<span id="cb15-10">)</span>
<span id="cb15-11"></span>
<span id="cb15-12">reviews_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> reviews_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentiment =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(review, analyze_sentiment))</span>
<span id="cb15-14"></span>
<span id="cb15-15">reviews_df</span></code></pre></div></div>
<table class="caption-top table">
<thead>
<tr class="header">
<th>id</th>
<th>review</th>
<th>sentiment</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>Love it!…</td>
<td>positive</td>
</tr>
<tr class="even">
<td>2</td>
<td>Broken on arrival…</td>
<td>negative</td>
</tr>
<tr class="odd">
<td>3</td>
<td>Does what it says…</td>
<td>neutral</td>
</tr>
<tr class="even">
<td>4</td>
<td>Best purchase…</td>
<td>positive</td>
</tr>
<tr class="odd">
<td>5</td>
<td>Meh. Expected…</td>
<td>negative</td>
</tr>
</tbody>
</table>
</section>
<section id="calculate-summary-statistics" class="level3">
<h3 class="anchored" data-anchor-id="calculate-summary-statistics">Calculate summary statistics</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">reviews_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(sentiment) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pct =</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(n) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span></code></pre></div></div>
<table class="caption-top table">
<thead>
<tr class="header">
<th>sentiment</th>
<th>n</th>
<th>pct</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>positive</td>
<td>2</td>
<td>40%</td>
</tr>
<tr class="even">
<td>negative</td>
<td>2</td>
<td>40%</td>
</tr>
<tr class="odd">
<td>neutral</td>
<td>1</td>
<td>20%</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="domain-specific-sentiment" class="level2">
<h2 class="anchored" data-anchor-id="domain-specific-sentiment">Domain-Specific Sentiment</h2>
<section id="customize-for-your-domain" class="level3">
<h3 class="anchored" data-anchor-id="customize-for-your-domain">Customize for your domain</h3>
<p>Add context for better accuracy in specific domains:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb17-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You analyze sentiment in restaurant reviews.</span></span>
<span id="cb17-3"></span>
<span id="cb17-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  Consider these aspects:</span></span>
<span id="cb17-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  - Food quality and taste</span></span>
<span id="cb17-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  - Service speed and friendliness</span></span>
<span id="cb17-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  - Ambiance and cleanliness</span></span>
<span id="cb17-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  - Value for money</span></span>
<span id="cb17-9"></span>
<span id="cb17-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  'Rich and decadent' is positive for desserts.</span></span>
<span id="cb17-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  'Simple and quick' is positive for lunch spots.</span></span>
<span id="cb17-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  'Crowded' can be positive (popular) or negative (uncomfortable)."</span></span>
<span id="cb17-13">)</span>
<span id="cb17-14"></span>
<span id="cb17-15">review <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The place was packed on a Saturday night.</span></span>
<span id="cb17-16"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Food took 45 minutes but was worth the wait."</span></span>
<span id="cb17-17"></span>
<span id="cb17-18">sentiment_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb17-19">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb17-20">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentiment"</span></span>
<span id="cb17-21">)</span>
<span id="cb17-22"></span>
<span id="cb17-23">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(review, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> sentiment_type)</span>
<span id="cb17-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "positive" (busy = popular, long wait but worth it)</span></span></code></pre></div></div>
</section>
</section>
<section id="emotion-detection" class="level2">
<h2 class="anchored" data-anchor-id="emotion-detection">Emotion Detection</h2>
<section id="beyond-positivenegative" class="level3">
<h3 class="anchored" data-anchor-id="beyond-positivenegative">Beyond positive/negative</h3>
<p>Detect specific emotions:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">emotion_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb18-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">primary_emotion =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb18-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"joy"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sadness"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"anger"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fear"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"surprise"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"disgust"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb18-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Primary emotion expressed"</span></span>
<span id="cb18-5">  ),</span>
<span id="cb18-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">intensity =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb18-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"low"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"medium"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"high"</span>),</span>
<span id="cb18-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Intensity of the emotion"</span></span>
<span id="cb18-9">  )</span>
<span id="cb18-10">)</span></code></pre></div></div>
</section>
<section id="detect-emotions" class="level3">
<h3 class="anchored" data-anchor-id="detect-emotions">Detect emotions</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb19-2"></span>
<span id="cb19-3">texts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb19-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I can't believe they did this to me!"</span>,</span>
<span id="cb19-5">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Just got the promotion! So happy!"</span>,</span>
<span id="cb19-6">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The news was shocking, didn't see it coming."</span></span>
<span id="cb19-7">)</span>
<span id="cb19-8"></span>
<span id="cb19-9">detect_emotion <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text) {</span>
<span id="cb19-10">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb19-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb19-12">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> emotion_type)</span>
<span id="cb19-13">}</span>
<span id="cb19-14"></span>
<span id="cb19-15">emotions <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(texts, detect_emotion)</span>
<span id="cb19-16"></span>
<span id="cb19-17"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb19-18">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> texts,</span>
<span id="cb19-19">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">emotion =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(emotions, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"primary_emotion"</span>),</span>
<span id="cb19-20">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">intensity =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(emotions, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"intensity"</span>)</span>
<span id="cb19-21">)</span></code></pre></div></div>
<table class="caption-top table">
<thead>
<tr class="header">
<th>text</th>
<th>emotion</th>
<th>intensity</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>I can’t believe…</td>
<td>anger</td>
<td>high</td>
</tr>
<tr class="even">
<td>Just got the promotion…</td>
<td>joy</td>
<td>high</td>
</tr>
<tr class="odd">
<td>The news was shocking…</td>
<td>surprise</td>
<td>medium</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="error-handling" class="level2">
<h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2>
<section id="handle-failures-gracefully" class="level3">
<h3 class="anchored" data-anchor-id="handle-failures-gracefully">Handle failures gracefully</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">safe_sentiment <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text) {</span>
<span id="cb20-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tryCatch</span>({</span>
<span id="cb20-3">    chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb20-4">    sentiment_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb20-5">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb20-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentiment"</span></span>
<span id="cb20-7">    )</span>
<span id="cb20-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb20-9">    chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> sentiment_type)</span>
<span id="cb20-10">  }, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">error =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(e) {</span>
<span id="cb20-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">warning</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentiment analysis failed: "</span>, e<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message)</span>
<span id="cb20-12">    <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_character_</span></span>
<span id="cb20-13">  })</span>
<span id="cb20-14">}</span>
<span id="cb20-15"></span>
<span id="cb20-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use with map</span></span>
<span id="cb20-17">sentiments <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(reviews, safe_sentiment)</span>
<span id="cb20-18"></span>
<span id="cb20-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check for failures</span></span>
<span id="cb20-20"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(sentiments))</span></code></pre></div></div>
</section>
</section>
<section id="local-sentiment-analysis" class="level2">
<h2 class="anchored" data-anchor-id="local-sentiment-analysis">Local Sentiment Analysis</h2>
<section id="use-ollama-for-free-private-analysis" class="level3">
<h3 class="anchored" data-anchor-id="use-ollama-for-free-private-analysis">Use Ollama for free, private analysis</h3>
<p>For sensitive data or high volume:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>)</span>
<span id="cb21-2"></span>
<span id="cb21-3">sentiment_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb21-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb21-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentiment"</span></span>
<span id="cb21-6">)</span>
<span id="cb21-7"></span>
<span id="cb21-8">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Great product, highly recommend!"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> sentiment_type)</span>
<span id="cb21-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "positive"</span></span></code></pre></div></div>
<p>See <a href="../llm/how-to-run-local-llms-in-r">Local LLMs with Ollama</a> for setup instructions.</p>
</section>
</section>
<section id="comparison-llm-vs-traditional" class="level2">
<h2 class="anchored" data-anchor-id="comparison-llm-vs-traditional">Comparison: LLM vs Traditional</h2>
<section id="when-to-use-each-approach" class="level3">
<h3 class="anchored" data-anchor-id="when-to-use-each-approach">When to use each approach</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Scenario</th>
<th>Best Approach</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Quick analysis, small dataset</td>
<td>LLM</td>
</tr>
<tr class="even">
<td>Production, millions of texts</td>
<td>Traditional ML (faster, cheaper)</td>
</tr>
<tr class="odd">
<td>Need to understand sarcasm</td>
<td>LLM</td>
</tr>
<tr class="even">
<td>Simple positive/negative</td>
<td>Either works</td>
</tr>
<tr class="odd">
<td>Multiple languages</td>
<td>LLM (built-in multilingual)</td>
</tr>
<tr class="even">
<td>Sensitive data</td>
<td>Local LLM (Ollama)</td>
</tr>
</tbody>
</table>
</section>
<section id="hybrid-approach" class="level3">
<h3 class="anchored" data-anchor-id="hybrid-approach">Hybrid approach</h3>
<p>Use LLMs to label training data, then train a faster model:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 1: Label 1000 samples with LLM</span></span>
<span id="cb22-2">labeled_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_sample</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb22-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentiment =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(text, analyze_sentiment))</span>
<span id="cb22-5"></span>
<span id="cb22-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 2: Train a fast classifier on labeled data</span></span>
<span id="cb22-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># (using tidymodels or similar)</span></span>
<span id="cb22-8"></span>
<span id="cb22-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Step 3: Use fast model for production</span></span></code></pre></div></div>
</section>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<table class="caption-top table">
<colgroup>
<col style="width: 50%">
<col style="width: 50%">
</colgroup>
<thead>
<tr class="header">
<th>Task</th>
<th>Code</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Basic sentiment</td>
<td><code>chat$extract_data(text, type_enum(c("positive", "negative", "neutral")))</code></td>
</tr>
<tr class="even">
<td>With confidence</td>
<td><code>type_object(sentiment = ..., confidence = type_number())</code></td>
</tr>
<tr class="odd">
<td>5-point scale</td>
<td><code>type_enum(c("very_negative", ..., "very_positive"))</code></td>
</tr>
<tr class="even">
<td>Aspect-based</td>
<td><code>type_object(overall = ..., aspects = type_array(...))</code></td>
</tr>
<tr class="odd">
<td>Emotion detection</td>
<td><code>type_enum(c("joy", "anger", "sadness", ...))</code></td>
</tr>
</tbody>
</table>
<p><strong>Key points:</strong></p>
<ul>
<li>LLMs understand sarcasm, context, and nuance</li>
<li>Use <code>type_enum()</code> for consistent output</li>
<li>Add domain context in system prompts for accuracy</li>
<li>Use <code>Sys.sleep()</code> in batch processing for rate limits</li>
<li>Consider local models for sensitive or high-volume data</li>
</ul>
</section>
<section id="related-posts" class="level2">
<h2 class="anchored" data-anchor-id="related-posts">Related Posts</h2>
<ul>
<li><a href="../llm/how-to-classify-text-with-llms-in-r">How to Classify Text with LLMs in R</a></li>
<li><a href="../llm/how-to-extract-data-with-llms-in-r">How to Extract Data with LLMs in R</a></li>
<li><a href="../llm/how-to-use-ellmer-in-r">How to Use ellmer in R</a></li>
<li><a href="../llm/how-to-use-claude-api-in-r">How to Use Claude API in R</a></li>
<li><a href="../llm/how-to-run-local-llms-in-r">How to Run Local LLMs in R</a></li>
</ul>
</section>
<section id="sources" class="level2">
<h2 class="anchored" data-anchor-id="sources">Sources</h2>
<ul>
<li><a href="https://ellmer.tidyverse.org/">ellmer Documentation</a></li>
<li><a href="https://ellmer.tidyverse.org/reference/">ellmer Type Definitions</a></li>
</ul>


<!-- -->

</section>

 ]]></description>
  <category>llm</category>
  <category>ellmer</category>
  <category>sentiment analysis</category>
  <guid>https://rstats101.com/llm/how-to-analyze-sentiment-with-llms-in-r.html</guid>
  <pubDate>Fri, 03 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/llm/how-to-analyze-sentiment-with-llms-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to Classify Text with LLMs in R</title>
  <link>https://rstats101.com/llm/how-to-classify-text-with-llms-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Text classification assigns categories to text. Traditional approaches require labeled training data and ML models. With LLMs, you can classify text with just a prompt - no training needed.</p>
<p><strong>Common use cases:</strong></p>
<ul>
<li>Categorize support tickets (billing, technical, account)</li>
<li>Tag documents by topic (finance, legal, HR)</li>
<li>Classify emails by intent (inquiry, complaint, feedback)</li>
<li>Label survey responses by theme</li>
</ul>
</section>
<section id="why-use-llms-for-classification" class="level2">
<h2 class="anchored" data-anchor-id="why-use-llms-for-classification">Why Use LLMs for Classification?</h2>
<table class="caption-top table">
<colgroup>
<col style="width: 20%">
<col style="width: 30%">
<col style="width: 24%">
<col style="width: 26%">
</colgroup>
<thead>
<tr class="header">
<th>Approach</th>
<th>Training Data</th>
<th>Setup Time</th>
<th>Flexibility</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Traditional ML</td>
<td>Thousands of examples</td>
<td>Days/weeks</td>
<td>Fixed categories</td>
</tr>
<tr class="even">
<td>LLMs</td>
<td>Zero examples</td>
<td>Minutes</td>
<td>Change categories anytime</td>
</tr>
</tbody>
</table>
<p>LLMs excel when:</p>
<ul>
<li>You don’t have labeled training data</li>
<li>Categories may change over time</li>
<li>You need quick prototyping</li>
</ul>
</section>
<section id="getting-started" class="level2">
<h2 class="anchored" data-anchor-id="getting-started">Getting Started</h2>
<p>Load the ellmer package for LLM access:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<p>This tutorial works with any provider. See <a href="../llm/how-to-use-ellmer-in-r">ellmer basics</a> for setup.</p>
</section>
<section id="basic-classification" class="level2">
<h2 class="anchored" data-anchor-id="basic-classification">Basic Classification</h2>
<section id="single-category-classification" class="level3">
<h3 class="anchored" data-anchor-id="single-category-classification">Single category classification</h3>
<p>Start with a simple prompt that asks the LLM to classify text:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb2-2"></span>
<span id="cb2-3">text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I can't log into my account and I've tried resetting my password three times."</span></span>
<span id="cb2-4"></span>
<span id="cb2-5">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(</span>
<span id="cb2-6"></span>
<span id="cb2-7">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Classify this support ticket into one category:"</span>,</span>
<span id="cb2-8">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"billing, technical, account, or other."</span>,</span>
<span id="cb2-9">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Reply with just the category name."</span>,</span>
<span id="cb2-10">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Ticket:"</span>, text</span>
<span id="cb2-11">))</span>
<span id="cb2-12"></span>
<span id="cb2-13">response</span>
<span id="cb2-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "account"</span></span></code></pre></div></div>
<p>The prompt has three parts:</p>
<ol type="1">
<li><strong>Task</strong>: “Classify this support ticket”</li>
<li><strong>Categories</strong>: “billing, technical, account, or other”</li>
<li><strong>Format</strong>: “Reply with just the category name”</li>
</ol>
</section>
<section id="why-just-the-category-name" class="level3">
<h3 class="anchored" data-anchor-id="why-just-the-category-name">Why “just the category name”?</h3>
<p>Without format instructions, LLMs tend to explain their reasoning:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Without format instruction - verbose response</span></span>
<span id="cb3-2">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Classify this as billing/technical/account: I can't log in"</span>)</span>
<span id="cb3-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "This appears to be an account-related issue because..."</span></span>
<span id="cb3-4"></span>
<span id="cb3-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># With format instruction - clean response</span></span>
<span id="cb3-6">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Classify as billing/technical/account. One word only: I can't log in"</span>)</span>
<span id="cb3-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "account"</span></span></code></pre></div></div>
<p>Clean responses are easier to process programmatically.</p>
</section>
</section>
<section id="structured-classification" class="level2">
<h2 class="anchored" data-anchor-id="structured-classification">Structured Classification</h2>
<section id="using-extract_data-for-reliable-output" class="level3">
<h3 class="anchored" data-anchor-id="using-extract_data-for-reliable-output">Using extract_data for reliable output</h3>
<p>For production use, <code>extract_data()</code> guarantees structured output:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Define the allowed categories</span></span>
<span id="cb4-2">category_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb4-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"billing"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"technical"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"account"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"other"</span>),</span>
<span id="cb4-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Support ticket category"</span></span>
<span id="cb4-5">)</span></code></pre></div></div>
<p>The <code>type_enum()</code> ensures the response is one of the allowed values.</p>
</section>
<section id="extract-the-category" class="level3">
<h3 class="anchored" data-anchor-id="extract-the-category">Extract the category</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb5-2"></span>
<span id="cb5-3">ticket <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Why was I charged twice this month? Please refund the extra charge."</span></span>
<span id="cb5-4"></span>
<span id="cb5-5">category <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(ticket, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> category_type)</span>
<span id="cb5-6"></span>
<span id="cb5-7">category</span>
<span id="cb5-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "billing"</span></span></code></pre></div></div>
<p>This approach:</p>
<ul>
<li>Guarantees valid output (no unexpected categories)</li>
<li>Returns clean data (no extra text)</li>
<li>Works reliably in automated pipelines</li>
</ul>
</section>
</section>
<section id="multi-label-classification" class="level2">
<h2 class="anchored" data-anchor-id="multi-label-classification">Multi-label Classification</h2>
<section id="when-text-belongs-to-multiple-categories" class="level3">
<h3 class="anchored" data-anchor-id="when-text-belongs-to-multiple-categories">When text belongs to multiple categories</h3>
<p>Some text fits multiple categories. Use an array type:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">categories_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_array</span>(</span>
<span id="cb6-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">items =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb6-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"urgent"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"billing"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"technical"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"account"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"feedback"</span>),</span>
<span id="cb6-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Applicable category"</span></span>
<span id="cb6-5">  ),</span>
<span id="cb6-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"All categories that apply to this ticket"</span></span>
<span id="cb6-7">)</span></code></pre></div></div>
</section>
<section id="classify-with-multiple-labels" class="level3">
<h3 class="anchored" data-anchor-id="classify-with-multiple-labels">Classify with multiple labels</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb7-2"></span>
<span id="cb7-3">ticket <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"URGENT: My payment failed and now I'm locked out of my account!"</span></span>
<span id="cb7-4"></span>
<span id="cb7-5">labels <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(ticket, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> categories_type)</span>
<span id="cb7-6"></span>
<span id="cb7-7">labels</span>
<span id="cb7-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ["urgent", "billing", "account"]</span></span></code></pre></div></div>
<p>The ticket is urgent, involves billing (payment failed), and account access.</p>
</section>
</section>
<section id="classification-with-confidence" class="level2">
<h2 class="anchored" data-anchor-id="classification-with-confidence">Classification with Confidence</h2>
<section id="get-confidence-scores" class="level3">
<h3 class="anchored" data-anchor-id="get-confidence-scores">Get confidence scores</h3>
<p>Sometimes you want to know how confident the model is:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">classification_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb8-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">category =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb8-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"billing"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"technical"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"account"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"other"</span>),</span>
<span id="cb8-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Primary category"</span></span>
<span id="cb8-5">  ),</span>
<span id="cb8-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">confidence =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_number</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Confidence score from 0 to 1"</span>),</span>
<span id="cb8-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">reasoning =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Brief explanation for the classification"</span>)</span>
<span id="cb8-8">)</span></code></pre></div></div>
</section>
<section id="extract-classification-with-confidence" class="level3">
<h3 class="anchored" data-anchor-id="extract-classification-with-confidence">Extract classification with confidence</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb9-2"></span>
<span id="cb9-3">ticket <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The app crashes when I try to upload files larger than 10MB"</span></span>
<span id="cb9-4"></span>
<span id="cb9-5">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(ticket, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> classification_type)</span>
<span id="cb9-6"></span>
<span id="cb9-7">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>category</span>
<span id="cb9-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "technical"</span></span>
<span id="cb9-9"></span>
<span id="cb9-10">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>confidence</span>
<span id="cb9-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 0.95</span></span>
<span id="cb9-12"></span>
<span id="cb9-13">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>reasoning</span>
<span id="cb9-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "This describes a software bug related to file uploads"</span></span></code></pre></div></div>
<p>Low confidence scores indicate ambiguous cases that may need human review.</p>
</section>
</section>
<section id="batch-classification" class="level2">
<h2 class="anchored" data-anchor-id="batch-classification">Batch Classification</h2>
<section id="create-a-reusable-classifier" class="level3">
<h3 class="anchored" data-anchor-id="create-a-reusable-classifier">Create a reusable classifier</h3>
<p>Wrap classification in a function for reuse:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">classify_ticket <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">chat =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>) {</span>
<span id="cb10-2">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(chat)) {</span>
<span id="cb10-3">    chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb10-4">  }</span>
<span id="cb10-5"></span>
<span id="cb10-6">  category_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb10-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"billing"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"technical"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"account"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"other"</span>),</span>
<span id="cb10-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Ticket category"</span></span>
<span id="cb10-9">  )</span>
<span id="cb10-10"></span>
<span id="cb10-11">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> category_type)</span>
<span id="cb10-12">}</span></code></pre></div></div>
</section>
<section id="classify-multiple-texts" class="level3">
<h3 class="anchored" data-anchor-id="classify-multiple-texts">Classify multiple texts</h3>
<p>Use purrr’s <code>map()</code> to process a vector of texts:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(purrr)</span>
<span id="cb11-2"></span>
<span id="cb11-3">tickets <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb11-4"> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I was charged twice for my subscription"</span>,</span>
<span id="cb11-5"> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The export feature isn't working"</span>,</span>
<span id="cb11-6"> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How do I change my email address?"</span>,</span>
<span id="cb11-7"> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Your product is amazing, thank you!"</span></span>
<span id="cb11-8">)</span>
<span id="cb11-9"></span>
<span id="cb11-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add delay to respect rate limits</span></span>
<span id="cb11-11">categories <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(tickets, \(ticket) {</span>
<span id="cb11-12"> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb11-13"> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">classify_ticket</span>(ticket)</span>
<span id="cb11-14">})</span>
<span id="cb11-15"></span>
<span id="cb11-16">categories</span>
<span id="cb11-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># ["billing", "technical", "account", "other"]</span></span></code></pre></div></div>
<p>The <code>Sys.sleep(0.5)</code> prevents hitting API rate limits.</p>
</section>
<section id="create-a-classified-data-frame" class="level3">
<h3 class="anchored" data-anchor-id="create-a-classified-data-frame">Create a classified data frame</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb12-2"> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ticket =</span> tickets,</span>
<span id="cb12-3"> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">category =</span> categories</span>
<span id="cb12-4">)</span></code></pre></div></div>
<table class="caption-top table">
<thead>
<tr class="header">
<th>ticket</th>
<th>category</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>I was charged twice…</td>
<td>billing</td>
</tr>
<tr class="even">
<td>The export feature…</td>
<td>technical</td>
</tr>
<tr class="odd">
<td>How do I change…</td>
<td>account</td>
</tr>
<tr class="even">
<td>Your product is…</td>
<td>other</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="hierarchical-classification" class="level2">
<h2 class="anchored" data-anchor-id="hierarchical-classification">Hierarchical Classification</h2>
<section id="two-level-categories" class="level3">
<h3 class="anchored" data-anchor-id="two-level-categories">Two-level categories</h3>
<p>For complex taxonomies, classify in stages:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># First level: main category</span></span>
<span id="cb13-2">main_category <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb13-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"technical"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"billing"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"account"</span>),</span>
<span id="cb13-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Main category"</span></span>
<span id="cb13-5">)</span>
<span id="cb13-6"></span>
<span id="cb13-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Second level: subcategories for technical issues</span></span>
<span id="cb13-8">technical_subcategory <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb13-9">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bug"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"feature_request"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"how_to"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"performance"</span>),</span>
<span id="cb13-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Technical subcategory"</span></span>
<span id="cb13-11">)</span></code></pre></div></div>
</section>
<section id="two-stage-classification" class="level3">
<h3 class="anchored" data-anchor-id="two-stage-classification">Two-stage classification</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">classify_hierarchical <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text) {</span>
<span id="cb14-2">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb14-3"></span>
<span id="cb14-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Stage 1: Main category</span></span>
<span id="cb14-5">  main <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> main_category)</span>
<span id="cb14-6"></span>
<span id="cb14-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Stage 2: Subcategory (only for technical)</span></span>
<span id="cb14-8">  sub <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (main <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"technical"</span>) {</span>
<span id="cb14-9">    chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> technical_subcategory)</span>
<span id="cb14-10">  } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb14-11">    <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span></span>
<span id="cb14-12">  }</span>
<span id="cb14-13"></span>
<span id="cb14-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">category =</span> main, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subcategory =</span> sub)</span>
<span id="cb14-15">}</span>
<span id="cb14-16"></span>
<span id="cb14-17">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">classify_hierarchical</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The dashboard loads very slowly"</span>)</span>
<span id="cb14-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $category: "technical"</span></span>
<span id="cb14-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $subcategory: "performance"</span></span></code></pre></div></div>
</section>
</section>
<section id="custom-categories" class="level2">
<h2 class="anchored" data-anchor-id="custom-categories">Custom Categories</h2>
<section id="define-your-own-taxonomy" class="level3">
<h3 class="anchored" data-anchor-id="define-your-own-taxonomy">Define your own taxonomy</h3>
<p>LLMs work with any categories you define:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Document types</span></span>
<span id="cb15-2">doc_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb15-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"contract"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"invoice"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"report"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"memo"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"other"</span>),</span>
<span id="cb15-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Document type"</span></span>
<span id="cb15-5">)</span>
<span id="cb15-6"></span>
<span id="cb15-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Email intents</span></span>
<span id="cb15-8">email_intent <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb15-9">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"inquiry"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"complaint"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"request"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"thank_you"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"spam"</span>),</span>
<span id="cb15-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Email intent"</span></span>
<span id="cb15-11">)</span>
<span id="cb15-12"></span>
<span id="cb15-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># News topics</span></span>
<span id="cb15-14">news_topic <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb15-15">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"politics"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"business"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"technology"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sports"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"entertainment"</span>),</span>
<span id="cb15-16">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"News article topic"</span></span>
<span id="cb15-17">)</span></code></pre></div></div>
</section>
<section id="use-with-any-text" class="level3">
<h3 class="anchored" data-anchor-id="use-with-any-text">Use with any text</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb16-2"></span>
<span id="cb16-3">article <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Apple announced record quarterly earnings driven by iPhone sales in Asia."</span></span>
<span id="cb16-4"></span>
<span id="cb16-5">topic <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(article, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> news_topic)</span>
<span id="cb16-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "business"</span></span></code></pre></div></div>
</section>
</section>
<section id="improving-accuracy" class="level2">
<h2 class="anchored" data-anchor-id="improving-accuracy">Improving Accuracy</h2>
<section id="add-context-to-the-system-prompt" class="level3">
<h3 class="anchored" data-anchor-id="add-context-to-the-system-prompt">Add context to the system prompt</h3>
<p>Provide domain context for better classification:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb17-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are a customer support classifier for a SaaS company.</span></span>
<span id="cb17-3"></span>
<span id="cb17-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  Category definitions:</span></span>
<span id="cb17-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  - billing: payments, charges, refunds, subscriptions, pricing</span></span>
<span id="cb17-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  - technical: bugs, errors, features not working, integrations</span></span>
<span id="cb17-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  - account: login, password, profile, settings, permissions</span></span>
<span id="cb17-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  - other: feedback, partnerships, general questions"</span></span>
<span id="cb17-9">)</span>
<span id="cb17-10"></span>
<span id="cb17-11">category_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb17-12">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"billing"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"technical"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"account"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"other"</span>),</span>
<span id="cb17-13">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Ticket category"</span></span>
<span id="cb17-14">)</span>
<span id="cb17-15"></span>
<span id="cb17-16">ticket <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Can you add dark mode to the app?"</span></span>
<span id="cb17-17"></span>
<span id="cb17-18">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(ticket, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> category_type)</span>
<span id="cb17-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "other" (feature request, not a bug)</span></span></code></pre></div></div>
</section>
<section id="provide-examples-few-shot" class="level3">
<h3 class="anchored" data-anchor-id="provide-examples-few-shot">Provide examples (few-shot)</h3>
<p>Include examples in the prompt for tricky cases:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb18-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Classify support tickets. Examples:</span></span>
<span id="cb18-3"></span>
<span id="cb18-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  'Charge me yearly instead of monthly' -&gt; billing</span></span>
<span id="cb18-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  'Button doesn't work on mobile' -&gt; technical</span></span>
<span id="cb18-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  'Reset my 2FA' -&gt; account</span></span>
<span id="cb18-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  'Love your product!' -&gt; other"</span></span>
<span id="cb18-8">)</span></code></pre></div></div>
</section>
</section>
<section id="error-handling" class="level2">
<h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2>
<section id="handle-api-failures-gracefully" class="level3">
<h3 class="anchored" data-anchor-id="handle-api-failures-gracefully">Handle API failures gracefully</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">safe_classify <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text) {</span>
<span id="cb19-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tryCatch</span>({</span>
<span id="cb19-3">    chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb19-4">    category_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb19-5">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"billing"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"technical"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"account"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"other"</span>),</span>
<span id="cb19-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Category"</span></span>
<span id="cb19-7">    )</span>
<span id="cb19-8">    chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> category_type)</span>
<span id="cb19-9">  }, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">error =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(e) {</span>
<span id="cb19-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">warning</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Classification failed: "</span>, e<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message)</span>
<span id="cb19-11">    <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_character_</span></span>
<span id="cb19-12">  })</span>
<span id="cb19-13">}</span>
<span id="cb19-14"></span>
<span id="cb19-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use with map</span></span>
<span id="cb19-16">results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(tickets, safe_classify)</span>
<span id="cb19-17"></span>
<span id="cb19-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter out failures</span></span>
<span id="cb19-19">valid_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> results[<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(results)]</span></code></pre></div></div>
</section>
</section>
<section id="local-classification" class="level2">
<h2 class="anchored" data-anchor-id="local-classification">Local Classification</h2>
<section id="use-ollama-for-free-private-classification" class="level3">
<h3 class="anchored" data-anchor-id="use-ollama-for-free-private-classification">Use Ollama for free, private classification</h3>
<p>For sensitive data or high volume, use local models:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>)</span>
<span id="cb20-2"></span>
<span id="cb20-3">category_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb20-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"billing"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"technical"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"account"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"other"</span>),</span>
<span id="cb20-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Category"</span></span>
<span id="cb20-6">)</span>
<span id="cb20-7"></span>
<span id="cb20-8">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I need a refund"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> category_type)</span>
<span id="cb20-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "billing"</span></span></code></pre></div></div>
<p>See <a href="../llm/how-to-run-local-llms-in-r">Local LLMs with Ollama</a> for setup.</p>
<p><strong>Note:</strong> Local models may be less accurate than cloud APIs for nuanced classification.</p>
</section>
</section>
<section id="performance-tips" class="level2">
<h2 class="anchored" data-anchor-id="performance-tips">Performance Tips</h2>
<section id="batch-similar-classifications" class="level3">
<h3 class="anchored" data-anchor-id="batch-similar-classifications">Batch similar classifications</h3>
<p>Process items in batches when possible:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Instead of classifying one at a time</span></span>
<span id="cb21-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Classify multiple items with context</span></span>
<span id="cb21-3"></span>
<span id="cb21-4">batch_classify <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(texts) {</span>
<span id="cb21-5">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb21-6"></span>
<span id="cb21-7">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Build numbered list</span></span>
<span id="cb21-8">  numbered <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq_along</span>(texts), texts, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">". "</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">collapse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb21-9"></span>
<span id="cb21-10">  prompt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(</span>
<span id="cb21-11">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Classify each item as billing/technical/account/other."</span>,</span>
<span id="cb21-12">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Return a JSON array of categories in order."</span>,</span>
<span id="cb21-13">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>, numbered</span>
<span id="cb21-14">  )</span>
<span id="cb21-15"></span>
<span id="cb21-16">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Parse JSON response</span></span>
<span id="cb21-17">  jsonlite<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fromJSON</span>(chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(prompt))</span>
<span id="cb21-18">}</span></code></pre></div></div>
</section>
<section id="use-cheaper-models-for-simple-tasks" class="level3">
<h3 class="anchored" data-anchor-id="use-cheaper-models-for-simple-tasks">Use cheaper models for simple tasks</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use Haiku for simple classification</span></span>
<span id="cb22-2">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude-haiku-4-5"</span>)</span>
<span id="cb22-3"></span>
<span id="cb22-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use Opus only for complex multi-label tasks</span></span>
<span id="cb22-5">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude-opus-4-6"</span>)</span></code></pre></div></div>
</section>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Task</th>
<th>Code</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Simple classification</td>
<td><code>chat$chat("Classify as X/Y/Z: text")</code></td>
</tr>
<tr class="even">
<td>Structured output</td>
<td><code>chat$extract_data(text, type_enum(...))</code></td>
</tr>
<tr class="odd">
<td>Multi-label</td>
<td><code>type_array(items = type_enum(...))</code></td>
</tr>
<tr class="even">
<td>With confidence</td>
<td><code>type_object(category = ..., confidence = ...)</code></td>
</tr>
<tr class="odd">
<td>Batch processing</td>
<td><code>map_chr(texts, classify_fn)</code></td>
</tr>
</tbody>
</table>
<p><strong>Key points:</strong></p>
<ul>
<li>Use <code>type_enum()</code> to guarantee valid categories</li>
<li>Add system prompts with category definitions for accuracy</li>
<li>Use <code>Sys.sleep()</code> in batch processing for rate limits</li>
<li>Consider local models for sensitive or high-volume data</li>
</ul>
</section>
<section id="related-posts" class="level2">
<h2 class="anchored" data-anchor-id="related-posts">Related Posts</h2>
<ul>
<li><a href="../llm/how-to-analyze-sentiment-with-llms-in-r">How to Analyze Sentiment with LLMs in R</a></li>
<li><a href="../llm/how-to-extract-data-with-llms-in-r">How to Extract Data with LLMs in R</a></li>
<li><a href="../llm/how-to-use-ellmer-in-r">How to Use ellmer in R</a></li>
<li><a href="../llm/how-to-use-claude-api-in-r">How to Use Claude API in R</a></li>
<li><a href="../llm/how-to-run-local-llms-in-r">How to Run Local LLMs in R</a></li>
</ul>
</section>
<section id="sources" class="level2">
<h2 class="anchored" data-anchor-id="sources">Sources</h2>
<ul>
<li><a href="https://ellmer.tidyverse.org/">ellmer Documentation</a></li>
<li><a href="https://ellmer.tidyverse.org/reference/">ellmer Type Definitions</a></li>
</ul>


<!-- -->

</section>

 ]]></description>
  <category>llm</category>
  <category>ellmer</category>
  <category>text classification</category>
  <guid>https://rstats101.com/llm/how-to-classify-text-with-llms-in-r.html</guid>
  <pubDate>Fri, 03 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/llm/how-to-classify-text-with-llms-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to Extract Structured Data with LLMs in R</title>
  <link>https://rstats101.com/llm/how-to-extract-data-with-llms-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>LLMs excel at extracting structured information from unstructured text. Instead of writing complex regex patterns, you can describe what you want and get clean, structured output.</p>
<p>This tutorial uses the <a href="../llm/how-to-use-ellmer-in-r">ellmer package</a>. You can use any provider: <a href="../llm/how-to-use-claude-api-in-r">Claude</a>, <a href="../llm/how-to-use-openai-api-in-r">OpenAI</a>, or <a href="../llm/how-to-run-local-llms-in-r">local models with Ollama</a>.</p>
<p><strong>Use cases:</strong> - Extract names, dates, addresses from text - Parse product information from descriptions - Convert free-text survey responses to categories - Extract entities from documents - Clean and standardize messy data</p>
</section>
<section id="getting-started" class="level2">
<h2 class="anchored" data-anchor-id="getting-started">Getting Started</h2>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
</section>
<section id="basic-extraction" class="level2">
<h2 class="anchored" data-anchor-id="basic-extraction">Basic Extraction</h2>
<section id="define-a-schema" class="level3">
<h3 class="anchored" data-anchor-id="define-a-schema">Define a schema</h3>
<p>Tell the LLM what structure you expect:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">person_schema <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb2-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Person's full name"</span>),</span>
<span id="cb2-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">age =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_integer</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Person's age in years"</span>),</span>
<span id="cb2-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">email =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Email address if mentioned"</span>)</span>
<span id="cb2-5">)</span></code></pre></div></div>
</section>
<section id="extract-from-text" class="level3">
<h3 class="anchored" data-anchor-id="extract-from-text">Extract from text</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb3-2"></span>
<span id="cb3-3">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(</span>
<span id="cb3-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"John Smith is 35 years old. You can reach him at john@email.com"</span>,</span>
<span id="cb3-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> person_schema</span>
<span id="cb3-6">)</span></code></pre></div></div>
</section>
<section id="access-the-results" class="level3">
<h3 class="anchored" data-anchor-id="access-the-results">Access the results</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>name   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "John Smith"</span></span>
<span id="cb4-2">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>age    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 35</span></span>
<span id="cb4-3">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>email  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "john@email.com"</span></span></code></pre></div></div>
</section>
<section id="handle-missing-data" class="level3">
<h3 class="anchored" data-anchor-id="handle-missing-data">Handle missing data</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(</span>
<span id="cb5-2">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sarah is 28 years old."</span>,</span>
<span id="cb5-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> person_schema</span>
<span id="cb5-4">)</span>
<span id="cb5-5"></span>
<span id="cb5-6">result</span>
<span id="cb5-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $name: "Sarah"</span></span>
<span id="cb5-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $age: 28</span></span>
<span id="cb5-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $email: NULL  # Not mentioned in text</span></span></code></pre></div></div>
</section>
</section>
<section id="type-definitions" class="level2">
<h2 class="anchored" data-anchor-id="type-definitions">Type Definitions</h2>
<section id="available-types" class="level3">
<h3 class="anchored" data-anchor-id="available-types">Available types</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># String</span></span>
<span id="cb6-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"description of the field"</span>)</span>
<span id="cb6-3"></span>
<span id="cb6-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Integer</span></span>
<span id="cb6-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_integer</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"description"</span>)</span>
<span id="cb6-6"></span>
<span id="cb6-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Number (float)</span></span>
<span id="cb6-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_number</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"description"</span>)</span>
<span id="cb6-9"></span>
<span id="cb6-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Boolean</span></span>
<span id="cb6-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_boolean</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"description"</span>)</span>
<span id="cb6-12"></span>
<span id="cb6-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Enum (predefined options)</span></span>
<span id="cb6-14"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb6-15">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb6-16">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentiment classification"</span></span>
<span id="cb6-17">)</span>
<span id="cb6-18"></span>
<span id="cb6-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Array (list of items)</span></span>
<span id="cb6-20"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_array</span>(</span>
<span id="cb6-21">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">items =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"individual item description"</span>),</span>
<span id="cb6-22">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A list of items"</span></span>
<span id="cb6-23">)</span>
<span id="cb6-24"></span>
<span id="cb6-25"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Object (nested structure)</span></span>
<span id="cb6-26"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb6-27">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">field1 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"..."</span>),</span>
<span id="cb6-28">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">field2 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_integer</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"..."</span>)</span>
<span id="cb6-29">)</span></code></pre></div></div>
</section>
</section>
<section id="practical-examples" class="level2">
<h2 class="anchored" data-anchor-id="practical-examples">Practical Examples</h2>
<section id="extract-product-information" class="level3">
<h3 class="anchored" data-anchor-id="extract-product-information">Extract product information</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">product_schema <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb7-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Product name"</span>),</span>
<span id="cb7-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">price =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_number</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Price in dollars"</span>),</span>
<span id="cb7-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">currency =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Currency code"</span>),</span>
<span id="cb7-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">features =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_array</span>(</span>
<span id="cb7-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">items =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A product feature"</span>)</span>
<span id="cb7-7">  )</span>
<span id="cb7-8">)</span>
<span id="cb7-9"></span>
<span id="cb7-10">text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The iPhone 15 Pro costs $999. Features include titanium design,</span></span>
<span id="cb7-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">A17 chip, 48MP camera, and USB-C port."</span></span>
<span id="cb7-12"></span>
<span id="cb7-13">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb7-14">product <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> product_schema)</span>
<span id="cb7-15"></span>
<span id="cb7-16">product</span>
<span id="cb7-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $name: "iPhone 15 Pro"</span></span>
<span id="cb7-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $price: 999</span></span>
<span id="cb7-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $currency: "USD"</span></span>
<span id="cb7-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $features: ["titanium design", "A17 chip", "48MP camera", "USB-C port"]</span></span></code></pre></div></div>
</section>
<section id="classify-sentiment" class="level3">
<h3 class="anchored" data-anchor-id="classify-sentiment">Classify sentiment</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">sentiment_schema <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb8-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentiment =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb8-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb8-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Overall sentiment"</span></span>
<span id="cb8-5">  ),</span>
<span id="cb8-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">confidence =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_number</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Confidence score from 0 to 1"</span>),</span>
<span id="cb8-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">key_phrases =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_array</span>(</span>
<span id="cb8-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">items =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Key phrase indicating sentiment"</span>)</span>
<span id="cb8-9">  )</span>
<span id="cb8-10">)</span>
<span id="cb8-11"></span>
<span id="cb8-12">review <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Absolutely love this product! Best purchase I've made this year.</span></span>
<span id="cb8-13"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">The quality is outstanding and shipping was super fast."</span></span>
<span id="cb8-14"></span>
<span id="cb8-15">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb8-16">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(review, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> sentiment_schema)</span>
<span id="cb8-17"></span>
<span id="cb8-18">result</span>
<span id="cb8-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $sentiment: "positive"</span></span>
<span id="cb8-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $confidence: 0.95</span></span>
<span id="cb8-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $key_phrases: ["Absolutely love", "Best purchase", "outstanding", "super fast"]</span></span></code></pre></div></div>
</section>
<section id="parse-contact-information" class="level3">
<h3 class="anchored" data-anchor-id="parse-contact-information">Parse contact information</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">contact_schema <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb9-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Full name"</span>),</span>
<span id="cb9-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">phone =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Phone number"</span>),</span>
<span id="cb9-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">email =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Email address"</span>),</span>
<span id="cb9-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">address =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb9-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">street =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Street address"</span>),</span>
<span id="cb9-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">city =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"City"</span>),</span>
<span id="cb9-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">state =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"State"</span>),</span>
<span id="cb9-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">zip =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ZIP code"</span>)</span>
<span id="cb9-10">  )</span>
<span id="cb9-11">)</span>
<span id="cb9-12"></span>
<span id="cb9-13">text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Contact Jane Doe at (555) 123-4567 or jane.doe@company.com.</span></span>
<span id="cb9-14"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Her office is at 123 Main Street, San Francisco, CA 94102."</span></span>
<span id="cb9-15"></span>
<span id="cb9-16">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb9-17">contact <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> contact_schema)</span></code></pre></div></div>
</section>
<section id="extract-dates-and-events" class="level3">
<h3 class="anchored" data-anchor-id="extract-dates-and-events">Extract dates and events</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">event_schema <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb10-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">event_name =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Name of the event"</span>),</span>
<span id="cb10-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">date =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Date in YYYY-MM-DD format"</span>),</span>
<span id="cb10-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">location =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Event location"</span>),</span>
<span id="cb10-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Brief description"</span>)</span>
<span id="cb10-6">)</span>
<span id="cb10-7"></span>
<span id="cb10-8">text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Join us for the R Users Meetup on March 15th, 2024 at the</span></span>
<span id="cb10-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Downtown Conference Center. We'll discuss data visualization techniques."</span></span>
<span id="cb10-10"></span>
<span id="cb10-11">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb10-12">event <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> event_schema)</span>
<span id="cb10-13"></span>
<span id="cb10-14">event</span>
<span id="cb10-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $event_name: "R Users Meetup"</span></span>
<span id="cb10-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $date: "2024-03-15"</span></span>
<span id="cb10-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $location: "Downtown Conference Center"</span></span>
<span id="cb10-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $description: "Discussion about data visualization techniques"</span></span></code></pre></div></div>
</section>
</section>
<section id="extracting-multiple-items" class="level2">
<h2 class="anchored" data-anchor-id="extracting-multiple-items">Extracting Multiple Items</h2>
<section id="extract-array-of-objects" class="level3">
<h3 class="anchored" data-anchor-id="extract-array-of-objects">Extract array of objects</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">person_schema <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb11-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Person's name"</span>),</span>
<span id="cb11-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Person's role or title"</span>)</span>
<span id="cb11-4">)</span>
<span id="cb11-5"></span>
<span id="cb11-6">people_schema <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_array</span>(</span>
<span id="cb11-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">items =</span> person_schema,</span>
<span id="cb11-8">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"List of people mentioned"</span></span>
<span id="cb11-9">)</span>
<span id="cb11-10"></span>
<span id="cb11-11">text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The meeting included CEO John Smith, CTO Sarah Johnson,</span></span>
<span id="cb11-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">and CFO Michael Brown. They discussed Q4 results."</span></span>
<span id="cb11-13"></span>
<span id="cb11-14">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb11-15">people <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> people_schema)</span>
<span id="cb11-16"></span>
<span id="cb11-17">people</span>
<span id="cb11-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [[1]] $name: "John Smith", $role: "CEO"</span></span>
<span id="cb11-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [[2]] $name: "Sarah Johnson", $role: "CTO"</span></span>
<span id="cb11-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># [[3]] $name: "Michael Brown", $role: "CFO"</span></span></code></pre></div></div>
</section>
<section id="convert-to-data-frame" class="level3">
<h3 class="anchored" data-anchor-id="convert-to-data-frame">Convert to data frame</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract as tibble</span></span>
<span id="cb12-2">people_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb12-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(people, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"name"</span>),</span>
<span id="cb12-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(people, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"role"</span>)</span>
<span id="cb12-5">)</span>
<span id="cb12-6"></span>
<span id="cb12-7">people_df</span></code></pre></div></div>
</section>
</section>
<section id="batch-processing" class="level2">
<h2 class="anchored" data-anchor-id="batch-processing">Batch Processing</h2>
<section id="process-multiple-texts" class="level3">
<h3 class="anchored" data-anchor-id="process-multiple-texts">Process multiple texts</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(purrr)</span>
<span id="cb13-2"></span>
<span id="cb13-3">reviews <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb13-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Great product, love it!"</span>,</span>
<span id="cb13-5">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Terrible quality, very disappointed"</span>,</span>
<span id="cb13-6">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"It's okay, nothing special"</span>,</span>
<span id="cb13-7">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Best purchase ever, highly recommend"</span></span>
<span id="cb13-8">)</span>
<span id="cb13-9"></span>
<span id="cb13-10">sentiment_schema <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb13-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb13-12">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sentiment"</span></span>
<span id="cb13-13">)</span>
<span id="cb13-14"></span>
<span id="cb13-15">extract_sentiment <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text) {</span>
<span id="cb13-16">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb13-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rate limiting</span></span>
<span id="cb13-18">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> sentiment_schema)</span>
<span id="cb13-19">}</span>
<span id="cb13-20"></span>
<span id="cb13-21">sentiments <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(reviews, extract_sentiment)</span>
<span id="cb13-22"></span>
<span id="cb13-23"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb13-24">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">review =</span> reviews,</span>
<span id="cb13-25">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentiment =</span> sentiments</span>
<span id="cb13-26">)</span></code></pre></div></div>
</section>
<section id="process-data-frame-column" class="level3">
<h3 class="anchored" data-anchor-id="process-data-frame-column">Process data frame column</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb14-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,</span>
<span id="cb14-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb14-4">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"John Smith, age 30, engineer"</span>,</span>
<span id="cb14-5">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Jane Doe, age 25, designer"</span>,</span>
<span id="cb14-6">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bob Brown, age 45, manager"</span></span>
<span id="cb14-7">  )</span>
<span id="cb14-8">)</span>
<span id="cb14-9"></span>
<span id="cb14-10">person_schema <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb14-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Name"</span>),</span>
<span id="cb14-12">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">age =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_integer</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Age"</span>),</span>
<span id="cb14-13">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">job =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Job title"</span>)</span>
<span id="cb14-14">)</span>
<span id="cb14-15"></span>
<span id="cb14-16">df_extracted <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb14-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">extracted =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(description, \(text) {</span>
<span id="cb14-19">      chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb14-20">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb14-21">      chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> person_schema)</span>
<span id="cb14-22">    }),</span>
<span id="cb14-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(extracted, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"name"</span>),</span>
<span id="cb14-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">age =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_int</span>(extracted, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"age"</span>),</span>
<span id="cb14-25">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">job =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(extracted, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"job"</span>)</span>
<span id="cb14-26">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-27">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>extracted)</span>
<span id="cb14-28"></span>
<span id="cb14-29">df_extracted</span></code></pre></div></div>
</section>
</section>
<section id="advanced-patterns" class="level2">
<h2 class="anchored" data-anchor-id="advanced-patterns">Advanced Patterns</h2>
<section id="extraction-with-instructions" class="level3">
<h3 class="anchored" data-anchor-id="extraction-with-instructions">Extraction with instructions</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb15-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Extract information exactly as specified.</span></span>
<span id="cb15-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  If information is unclear, make your best inference.</span></span>
<span id="cb15-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  Use NULL for genuinely missing data."</span></span>
<span id="cb15-5">)</span>
<span id="cb15-6"></span>
<span id="cb15-7">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> schema)</span></code></pre></div></div>
</section>
<section id="validate-extracted-data" class="level3">
<h3 class="anchored" data-anchor-id="validate-extracted-data">Validate extracted data</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">extract_and_validate <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text, schema, validation_fn) {</span>
<span id="cb16-2">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb16-3">  result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> schema)</span>
<span id="cb16-4"></span>
<span id="cb16-5">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">validation_fn</span>(result)) {</span>
<span id="cb16-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">warning</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Extraction may be incomplete or invalid"</span>)</span>
<span id="cb16-7">  }</span>
<span id="cb16-8"></span>
<span id="cb16-9">  result</span>
<span id="cb16-10">}</span>
<span id="cb16-11"></span>
<span id="cb16-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Example validation</span></span>
<span id="cb16-13">validate_person <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(person) {</span>
<span id="cb16-14">  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(person<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>name) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;&amp;</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(person<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>age)</span>
<span id="cb16-15">}</span>
<span id="cb16-16"></span>
<span id="cb16-17">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_and_validate</span>(</span>
<span id="cb16-18">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Some text"</span>,</span>
<span id="cb16-19">  person_schema,</span>
<span id="cb16-20">  validate_person</span>
<span id="cb16-21">)</span></code></pre></div></div>
</section>
<section id="combine-extraction-with-classification" class="level3">
<h3 class="anchored" data-anchor-id="combine-extraction-with-classification">Combine extraction with classification</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">ticket_schema <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb17-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">category =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb17-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"billing"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"technical"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"account"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"other"</span>),</span>
<span id="cb17-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Support ticket category"</span></span>
<span id="cb17-5">  ),</span>
<span id="cb17-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">priority =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb17-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"low"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"medium"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"high"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"urgent"</span>),</span>
<span id="cb17-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Priority level"</span></span>
<span id="cb17-9">  ),</span>
<span id="cb17-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">summary =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"One-sentence summary"</span>),</span>
<span id="cb17-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">entities =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb17-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">account_id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Account ID if mentioned"</span>),</span>
<span id="cb17-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">error_code =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Error code if mentioned"</span>)</span>
<span id="cb17-14">  )</span>
<span id="cb17-15">)</span>
<span id="cb17-16"></span>
<span id="cb17-17">ticket <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hi, I can't log into my account #12345. Getting error E401.</span></span>
<span id="cb17-18"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">This is urgent as I need to complete a transaction today!"</span></span>
<span id="cb17-19"></span>
<span id="cb17-20">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb17-21">parsed_ticket <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(ticket, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> ticket_schema)</span></code></pre></div></div>
</section>
</section>
<section id="error-handling" class="level2">
<h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">safe_extract <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text, schema) {</span>
<span id="cb18-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tryCatch</span>({</span>
<span id="cb18-3">    chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb18-4">    chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> schema)</span>
<span id="cb18-5">  }, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">error =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(e) {</span>
<span id="cb18-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">warning</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Extraction failed: "</span>, e<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message)</span>
<span id="cb18-7">    <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span></span>
<span id="cb18-8">  })</span>
<span id="cb18-9">}</span>
<span id="cb18-10"></span>
<span id="cb18-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use with map for batch processing</span></span>
<span id="cb18-12">results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(texts, \(t) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_extract</span>(t, schema))</span>
<span id="cb18-13"></span>
<span id="cb18-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter out failures</span></span>
<span id="cb18-15">valid_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">compact</span>(results)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Remove NULLs</span></span></code></pre></div></div>
</section>
<section id="local-llm-extraction" class="level2">
<h2 class="anchored" data-anchor-id="local-llm-extraction">Local LLM Extraction</h2>
<p>Use <a href="../llm/how-to-run-local-llms-in-r">Ollama</a> for free, private extraction:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Works the same way with local models</span></span>
<span id="cb19-2">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>)</span>
<span id="cb19-3"></span>
<span id="cb19-4">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(</span>
<span id="cb19-5">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"John Smith, 35 years old, john@email.com"</span>,</span>
<span id="cb19-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> person_schema</span>
<span id="cb19-7">)</span></code></pre></div></div>
<p><strong>Note:</strong> Local models may be less accurate for complex schemas. Test thoroughly.</p>
</section>
<section id="common-mistakes" class="level2">
<h2 class="anchored" data-anchor-id="common-mistakes">Common Mistakes</h2>
<p><strong>1. Schema too complex</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Too many nested levels can confuse the model</span></span>
<span id="cb20-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Break into simpler extractions if needed</span></span></code></pre></div></div>
<p><strong>2. Ambiguous field descriptions</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Bad</span></span>
<span id="cb21-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"date"</span>)</span>
<span id="cb21-3"></span>
<span id="cb21-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Good</span></span>
<span id="cb21-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Event date in YYYY-MM-DD format"</span>)</span></code></pre></div></div>
<p><strong>3. Not handling NULL values</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Always check for NULLs</span></span>
<span id="cb22-2">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>field <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%||%</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"default_value"</span></span>
<span id="cb22-3"></span>
<span id="cb22-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Or use map with default</span></span>
<span id="cb22-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(results, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"field"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.default =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_character_</span>)</span></code></pre></div></div>
<p><strong>4. Forgetting rate limits in batches</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Always add delays</span></span>
<span id="cb23-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(texts, \(t) {</span>
<span id="cb23-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Important!</span></span>
<span id="cb23-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract</span>(t)</span>
<span id="cb23-5">})</span></code></pre></div></div>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Task</th>
<th>Code</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Define string field</td>
<td><code>type_string("description")</code></td>
</tr>
<tr class="even">
<td>Define number field</td>
<td><code>type_number("description")</code></td>
</tr>
<tr class="odd">
<td>Define enum field</td>
<td><code>type_enum(values = c(...))</code></td>
</tr>
<tr class="even">
<td>Define array</td>
<td><code>type_array(items = type_*())</code></td>
</tr>
<tr class="odd">
<td>Define object</td>
<td><code>type_object(field = type_*())</code></td>
</tr>
<tr class="even">
<td>Extract data</td>
<td><code>chat$extract_data(text, type)</code></td>
</tr>
</tbody>
</table>
<ul>
<li>Define schemas with <code>type_*()</code> functions</li>
<li>Use clear field descriptions</li>
<li>Handle NULL values for missing data</li>
<li>Add delays when batch processing</li>
<li>Validate extracted data when reliability is important</li>
</ul>
</section>
<section id="related-posts" class="level2">
<h2 class="anchored" data-anchor-id="related-posts">Related Posts</h2>
<ul>
<li><a href="../llm/how-to-classify-text-with-llms-in-r">How to Classify Text with LLMs in R</a></li>
<li><a href="../llm/how-to-analyze-sentiment-with-llms-in-r">How to Analyze Sentiment with LLMs in R</a></li>
<li><a href="../llm/how-to-use-ellmer-in-r">How to Use ellmer in R</a></li>
<li><a href="../llm/how-to-use-claude-api-in-r">How to Use Claude API in R</a></li>
<li><a href="../llm/how-to-run-local-llms-in-r">How to Run Local LLMs in R</a></li>
</ul>
</section>
<section id="sources" class="level2">
<h2 class="anchored" data-anchor-id="sources">Sources</h2>
<ul>
<li><a href="https://ellmer.tidyverse.org/">ellmer Data Extraction Documentation</a></li>
<li><a href="https://ellmer.tidyverse.org/reference/">ellmer Type Definitions Reference</a></li>
</ul>


<!-- -->

</section>

 ]]></description>
  <category>llm</category>
  <category>ellmer</category>
  <category>data extraction</category>
  <guid>https://rstats101.com/llm/how-to-extract-data-with-llms-in-r.html</guid>
  <pubDate>Fri, 03 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/llm/how-to-extract-data-with-llms-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to Run Local LLMs in R</title>
  <link>https://rstats101.com/llm/how-to-run-local-llms-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Running LLMs locally gives you: - <strong>No API costs</strong> - completely free after setup - <strong>Privacy</strong> - data never leaves your machine - <strong>Offline access</strong> - works without internet - <strong>No rate limits</strong> - unlimited requests</p>
<p>Ollama makes local LLMs easy by handling model downloads, memory management, and providing a simple API. We’ll use the <a href="../llm/how-to-use-ellmer-in-r">ellmer package</a> to connect R to Ollama.</p>
<p><strong>Prefer cloud APIs?</strong> See <a href="../llm/how-to-use-openai-api-in-r">OpenAI</a> or <a href="../llm/how-to-use-claude-api-in-r">Claude</a> for more powerful models.</p>
</section>
<section id="getting-started" class="level2">
<h2 class="anchored" data-anchor-id="getting-started">Getting Started</h2>
<section id="install-ollama" class="level3">
<h3 class="anchored" data-anchor-id="install-ollama">Install Ollama</h3>
<p>Download and install from <a href="https://ollama.com">ollama.com</a>:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># macOS</span></span>
<span id="cb1-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">brew</span> install ollama</span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Linux</span></span>
<span id="cb1-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-fsSL</span> https://ollama.com/install.sh <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sh</span></span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Windows</span></span>
<span id="cb1-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Download from ollama.com</span></span></code></pre></div></div>
</section>
<section id="start-ollama" class="level3">
<h3 class="anchored" data-anchor-id="start-ollama">Start Ollama</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Start the Ollama server</span></span>
<span id="cb2-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> serve</span></code></pre></div></div>
</section>
<section id="download-a-model" class="level3">
<h3 class="anchored" data-anchor-id="download-a-model">Download a model</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Download Llama 3 (8B parameters, ~4GB)</span></span>
<span id="cb3-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> pull llama3.2</span>
<span id="cb3-3"></span>
<span id="cb3-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Download smaller model (faster)</span></span>
<span id="cb3-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> pull llama3.2:1b</span>
<span id="cb3-6"></span>
<span id="cb3-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Download Mistral</span></span>
<span id="cb3-8"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> pull mistral</span></code></pre></div></div>
</section>
</section>
<section id="using-ollama-in-r" class="level2">
<h2 class="anchored" data-anchor-id="using-ollama-in-r">Using Ollama in R</h2>
<section id="with-ellmer-recommended" class="level3">
<h3 class="anchored" data-anchor-id="with-ellmer-recommended">With ellmer (recommended)</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install.packages</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ellmer"</span>)</span>
<span id="cb4-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span>
<span id="cb4-3"></span>
<span id="cb4-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Connect to local Ollama</span></span>
<span id="cb4-5">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>)</span>
<span id="cb4-6">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What is R programming?"</span>)</span></code></pre></div></div>
</section>
<section id="with-ollamar-package" class="level3">
<h3 class="anchored" data-anchor-id="with-ollamar-package">With ollamar package</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install.packages</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ollamar"</span>)</span>
<span id="cb5-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ollamar)</span>
<span id="cb5-3"></span>
<span id="cb5-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generate text</span></span>
<span id="cb5-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">generate</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Explain data frames in R"</span>)</span>
<span id="cb5-6"></span>
<span id="cb5-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Chat format</span></span>
<span id="cb5-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(</span>
<span id="cb5-9">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>,</span>
<span id="cb5-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb5-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What is ggplot2?"</span>)</span>
<span id="cb5-12">  )</span>
<span id="cb5-13">)</span></code></pre></div></div>
</section>
</section>
<section id="available-models" class="level2">
<h2 class="anchored" data-anchor-id="available-models">Available Models</h2>
<section id="recommended-for-r-coding" class="level3">
<h3 class="anchored" data-anchor-id="recommended-for-r-coding">Recommended for R coding</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Size</th>
<th>RAM Needed</th>
<th>Best For</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>llama3.2:1b</code></td>
<td>1.3GB</td>
<td>4GB</td>
<td>Fast responses, simple tasks</td>
</tr>
<tr class="even">
<td><code>llama3.2</code></td>
<td>4.7GB</td>
<td>8GB</td>
<td>Balanced, good for coding</td>
</tr>
<tr class="odd">
<td><code>codellama</code></td>
<td>3.8GB</td>
<td>8GB</td>
<td>Code generation</td>
</tr>
<tr class="even">
<td><code>mistral</code></td>
<td>4.1GB</td>
<td>8GB</td>
<td>General tasks</td>
</tr>
<tr class="odd">
<td><code>mixtral</code></td>
<td>26GB</td>
<td>48GB</td>
<td>Best quality, needs lots of RAM</td>
</tr>
</tbody>
</table>
</section>
<section id="download-models" class="level3">
<h3 class="anchored" data-anchor-id="download-models">Download models</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Code-focused</span></span>
<span id="cb6-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> pull codellama</span>
<span id="cb6-3"></span>
<span id="cb6-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># General purpose</span></span>
<span id="cb6-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> pull llama3.2</span>
<span id="cb6-6"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> pull mistral</span>
<span id="cb6-7"></span>
<span id="cb6-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Smaller/faster</span></span>
<span id="cb6-9"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> pull llama3.2:1b</span>
<span id="cb6-10"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> pull phi3</span></code></pre></div></div>
</section>
<section id="list-installed-models" class="level3">
<h3 class="anchored" data-anchor-id="list-installed-models">List installed models</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ollamar)</span>
<span id="cb7-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list_models</span>()</span></code></pre></div></div>
</section>
</section>
<section id="basic-usage-with-ellmer" class="level2">
<h2 class="anchored" data-anchor-id="basic-usage-with-ellmer">Basic Usage with ellmer</h2>
<section id="simple-chat" class="level3">
<h3 class="anchored" data-anchor-id="simple-chat">Simple chat</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span>
<span id="cb8-2"></span>
<span id="cb8-3">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>)</span>
<span id="cb8-4">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How do I read a CSV file in R?"</span>)</span></code></pre></div></div>
</section>
<section id="with-system-prompt" class="level3">
<h3 class="anchored" data-anchor-id="with-system-prompt">With system prompt</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(</span>
<span id="cb9-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>,</span>
<span id="cb9-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are an R programming expert. Provide concise answers with code examples."</span></span>
<span id="cb9-4">)</span>
<span id="cb9-5"></span>
<span id="cb9-6">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How do I calculate the mean by group?"</span>)</span></code></pre></div></div>
</section>
<section id="multi-turn-conversation" class="level3">
<h3 class="anchored" data-anchor-id="multi-turn-conversation">Multi-turn conversation</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>)</span>
<span id="cb10-2"></span>
<span id="cb10-3">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I have a dataset with customer purchase data."</span>)</span>
<span id="cb10-4">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How would I find the top 10 customers by total spend?"</span>)</span>
<span id="cb10-5">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Now show me how to visualize this."</span>)</span></code></pre></div></div>
</section>
</section>
<section id="practical-examples" class="level2">
<h2 class="anchored" data-anchor-id="practical-examples">Practical Examples</h2>
<section id="generate-r-code" class="level3">
<h3 class="anchored" data-anchor-id="generate-r-code">Generate R code</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(</span>
<span id="cb11-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"codellama"</span>,</span>
<span id="cb11-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Return only R code. No explanations."</span></span>
<span id="cb11-4">)</span>
<span id="cb11-5"></span>
<span id="cb11-6">code <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb11-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Write a function that:</span></span>
<span id="cb11-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">1. Takes a data frame</span></span>
<span id="cb11-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">2. Finds numeric columns</span></span>
<span id="cb11-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">3. Scales them to 0-1 range</span></span>
<span id="cb11-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">4. Returns the modified data frame</span></span>
<span id="cb11-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb11-13"></span>
<span id="cb11-14"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(code)</span></code></pre></div></div>
</section>
<section id="analyze-text-data" class="level3">
<h3 class="anchored" data-anchor-id="analyze-text-data">Analyze text data</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">classify_sentiment <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text) {</span>
<span id="cb12-2">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(</span>
<span id="cb12-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>,</span>
<span id="cb12-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Classify sentiment as positive, negative, or neutral. Reply with one word only."</span></span>
<span id="cb12-5">  )</span>
<span id="cb12-6">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(text)</span>
<span id="cb12-7">}</span>
<span id="cb12-8"></span>
<span id="cb12-9">reviews <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb12-10">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"This product is amazing!"</span>,</span>
<span id="cb12-11">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Terrible quality, very disappointed"</span>,</span>
<span id="cb12-12">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"It works as expected"</span></span>
<span id="cb12-13">)</span>
<span id="cb12-14"></span>
<span id="cb12-15"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sapply</span>(reviews, classify_sentiment)</span></code></pre></div></div>
</section>
<section id="explain-code" class="level3">
<h3 class="anchored" data-anchor-id="explain-code">Explain code</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>)</span>
<span id="cb13-2"></span>
<span id="cb13-3">code <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb13-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">mtcars |&gt;</span></span>
<span id="cb13-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  filter(mpg &gt; 20) |&gt;</span></span>
<span id="cb13-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  group_by(cyl) |&gt;</span></span>
<span id="cb13-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  summarise(mean_hp = mean(hp))</span></span>
<span id="cb13-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb13-9"></span>
<span id="cb13-10">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Explain this R code:"</span>, code))</span></code></pre></div></div>
</section>
</section>
<section id="using-ollamar-directly" class="level2">
<h2 class="anchored" data-anchor-id="using-ollamar-directly">Using ollamar Directly</h2>
<section id="generate-text" class="level3">
<h3 class="anchored" data-anchor-id="generate-text">Generate text</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ollamar)</span>
<span id="cb14-2"></span>
<span id="cb14-3">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">generate</span>(</span>
<span id="cb14-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>,</span>
<span id="cb14-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Write R code to create a bar chart"</span></span>
<span id="cb14-6">)</span>
<span id="cb14-7"></span>
<span id="cb14-8">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>response</span></code></pre></div></div>
</section>
<section id="chat-with-history" class="level3">
<h3 class="anchored" data-anchor-id="chat-with-history">Chat with history</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">messages <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What is dplyr?"</span>)</span>
<span id="cb15-3">)</span>
<span id="cb15-4"></span>
<span id="cb15-5">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> messages)</span>
<span id="cb15-6"></span>
<span id="cb15-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Continue conversation</span></span>
<span id="cb15-8">messages <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(messages, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb15-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"assistant"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> response<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>content),</span>
<span id="cb15-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Show me an example of filter()"</span>)</span>
<span id="cb15-11">))</span>
<span id="cb15-12"></span>
<span id="cb15-13">response2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> messages)</span></code></pre></div></div>
</section>
<section id="embeddings-for-semantic-search" class="level3">
<h3 class="anchored" data-anchor-id="embeddings-for-semantic-search">Embeddings for semantic search</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generate embeddings</span></span>
<span id="cb16-2">embedding <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">embeddings</span>(</span>
<span id="cb16-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>,</span>
<span id="cb16-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">input =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How to handle missing values in R"</span></span>
<span id="cb16-5">)</span>
<span id="cb16-6"></span>
<span id="cb16-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use for similarity search, clustering, etc.</span></span>
<span id="cb16-8">embedding<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>embedding</span></code></pre></div></div>
</section>
</section>
<section id="performance-tips" class="level2">
<h2 class="anchored" data-anchor-id="performance-tips">Performance Tips</h2>
<section id="choose-the-right-model-size" class="level3">
<h3 class="anchored" data-anchor-id="choose-the-right-model-size">Choose the right model size</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fast but less capable (good for simple tasks)</span></span>
<span id="cb17-2">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2:1b"</span>)</span>
<span id="cb17-3"></span>
<span id="cb17-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Balanced (good for most tasks)</span></span>
<span id="cb17-5">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>)</span>
<span id="cb17-6"></span>
<span id="cb17-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Best quality (slower, needs more RAM)</span></span>
<span id="cb17-8">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mixtral"</span>)</span></code></pre></div></div>
</section>
<section id="limit-context-length" class="level3">
<h3 class="anchored" data-anchor-id="limit-context-length">Limit context length</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Shorter context = faster responses</span></span>
<span id="cb18-2">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(</span>
<span id="cb18-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>,</span>
<span id="cb18-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">api_args =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">num_ctx =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2048</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Default is 4096</span></span>
<span id="cb18-5">)</span></code></pre></div></div>
</section>
<section id="gpu-acceleration" class="level3">
<h3 class="anchored" data-anchor-id="gpu-acceleration">GPU acceleration</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb19-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Ollama automatically uses GPU if available</span></span>
<span id="cb19-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check GPU usage:</span></span>
<span id="cb19-3"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> ps</span></code></pre></div></div>
</section>
</section>
<section id="batch-processing" class="level2">
<h2 class="anchored" data-anchor-id="batch-processing">Batch Processing</h2>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(purrr)</span>
<span id="cb20-2"></span>
<span id="cb20-3">texts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Text 1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Text 2"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Text 3"</span>)</span>
<span id="cb20-4"></span>
<span id="cb20-5">process_local <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(texts, prompt_template) {</span>
<span id="cb20-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(texts, \(text) {</span>
<span id="cb20-7">    chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>)</span>
<span id="cb20-8">    chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sprintf</span>(prompt_template, text))</span>
<span id="cb20-9">  })</span>
<span id="cb20-10">}</span>
<span id="cb20-11"></span>
<span id="cb20-12">summaries <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">process_local</span>(</span>
<span id="cb20-13">  texts,</span>
<span id="cb20-14">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Summarize in one sentence: %s"</span></span>
<span id="cb20-15">)</span></code></pre></div></div>
</section>
<section id="comparing-local-vs-cloud" class="level2">
<h2 class="anchored" data-anchor-id="comparing-local-vs-cloud">Comparing Local vs Cloud</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Feature</th>
<th>Local (Ollama)</th>
<th>Cloud (OpenAI/Claude)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Cost</td>
<td>Free</td>
<td>Pay per token</td>
</tr>
<tr class="even">
<td>Privacy</td>
<td>Complete</td>
<td>Data sent to servers</td>
</tr>
<tr class="odd">
<td>Speed</td>
<td>Depends on hardware</td>
<td>Generally fast</td>
</tr>
<tr class="even">
<td>Quality</td>
<td>Good (varies by model)</td>
<td>Best available</td>
</tr>
<tr class="odd">
<td>Offline</td>
<td>Yes</td>
<td>No</td>
</tr>
<tr class="even">
<td>Rate limits</td>
<td>None</td>
<td>Yes</td>
</tr>
</tbody>
</table>
<section id="when-to-use-local" class="level3">
<h3 class="anchored" data-anchor-id="when-to-use-local">When to use local</h3>
<ul>
<li>Privacy-sensitive data</li>
<li>High volume, low budget</li>
<li>Offline requirements</li>
<li>Learning/experimentation</li>
</ul>
</section>
<section id="when-to-use-cloud" class="level3">
<h3 class="anchored" data-anchor-id="when-to-use-cloud">When to use cloud</h3>
<ul>
<li>Need best quality</li>
<li>Don’t have GPU</li>
<li>Production applications</li>
<li>Complex reasoning tasks</li>
</ul>
</section>
</section>
<section id="troubleshooting" class="level2">
<h2 class="anchored" data-anchor-id="troubleshooting">Troubleshooting</h2>
<section id="ollama-not-running" class="level3">
<h3 class="anchored" data-anchor-id="ollama-not-running">Ollama not running</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check if Ollama is running</span></span>
<span id="cb21-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tryCatch</span>({</span>
<span id="cb21-3">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>)</span>
<span id="cb21-4">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"test"</span>)</span>
<span id="cb21-5">}, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">error =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(e) {</span>
<span id="cb21-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">message</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Start Ollama with: ollama serve"</span>)</span>
<span id="cb21-7">})</span></code></pre></div></div>
</section>
<section id="model-not-found" class="level3">
<h3 class="anchored" data-anchor-id="model-not-found">Model not found</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb22-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># List available models</span></span>
<span id="cb22-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> list</span>
<span id="cb22-3"></span>
<span id="cb22-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Pull missing model</span></span>
<span id="cb22-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> pull llama3.2</span></code></pre></div></div>
</section>
<section id="out-of-memory" class="level3">
<h3 class="anchored" data-anchor-id="out-of-memory">Out of memory</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb23-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use smaller model</span></span>
<span id="cb23-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> pull llama3.2:1b</span>
<span id="cb23-3"></span>
<span id="cb23-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Or reduce context</span></span>
<span id="cb23-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set num_ctx to lower value in R</span></span></code></pre></div></div>
</section>
<section id="slow-responses" class="level3">
<h3 class="anchored" data-anchor-id="slow-responses">Slow responses</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use smaller model</span></span>
<span id="cb24-2">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2:1b"</span>)</span>
<span id="cb24-3"></span>
<span id="cb24-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Reduce max tokens</span></span>
<span id="cb24-5">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>(</span>
<span id="cb24-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"llama3.2"</span>,</span>
<span id="cb24-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">api_args =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">num_predict =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb24-8">)</span></code></pre></div></div>
</section>
</section>
<section id="common-mistakes" class="level2">
<h2 class="anchored" data-anchor-id="common-mistakes">Common Mistakes</h2>
<p><strong>1. Forgetting to start Ollama server</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb25-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Must run this first</span></span>
<span id="cb25-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> serve</span></code></pre></div></div>
<p><strong>2. Using model that’s not downloaded</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb26-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check what's installed</span></span>
<span id="cb26-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> list</span>
<span id="cb26-3"></span>
<span id="cb26-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Download if needed</span></span>
<span id="cb26-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ollama</span> pull llama3.2</span></code></pre></div></div>
<p><strong>3. Expecting cloud-level quality</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Local models are good but not as capable as GPT-4 or Claude</span></span>
<span id="cb27-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Adjust expectations and prompts accordingly</span></span></code></pre></div></div>
<p><strong>4. Not providing good prompts</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Be specific with local models</span></span>
<span id="cb28-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># They need clearer instructions than cloud models</span></span>
<span id="cb28-3"></span>
<span id="cb28-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Too vague</span></span>
<span id="cb28-5">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"help with data"</span>)</span>
<span id="cb28-6"></span>
<span id="cb28-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Better</span></span>
<span id="cb28-8">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Write R code to calculate the mean of the 'price' column in a data frame called 'sales'"</span>)</span></code></pre></div></div>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Task</th>
<th>Code</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Start Ollama</td>
<td><code>ollama serve</code> (terminal)</td>
</tr>
<tr class="even">
<td>Download model</td>
<td><code>ollama pull llama3.2</code> (terminal)</td>
</tr>
<tr class="odd">
<td>Chat with ellmer</td>
<td><code>chat_ollama(model = "llama3.2")</code></td>
</tr>
<tr class="even">
<td>Generate text</td>
<td><code>ollamar::generate(model, prompt)</code></td>
</tr>
<tr class="odd">
<td>List models</td>
<td><code>ollamar::list_models()</code></td>
</tr>
</tbody>
</table>
<ul>
<li>Install Ollama from ollama.com</li>
<li>Download models with <code>ollama pull</code></li>
<li>Use <code>chat_ollama()</code> from ellmer for easy integration</li>
<li>Smaller models (1b, 3b) are faster but less capable</li>
<li>Local LLMs are free, private, and work offline</li>
</ul>
</section>
<section id="related-posts" class="level2">
<h2 class="anchored" data-anchor-id="related-posts">Related Posts</h2>
<ul>
<li><a href="../llm/how-to-use-ellmer-in-r">How to Use ellmer in R</a></li>
<li><a href="../llm/how-to-use-openai-api-in-r">How to Use OpenAI API in R</a></li>
<li><a href="../llm/how-to-use-claude-api-in-r">How to Use Claude API in R</a></li>
<li><a href="../llm/how-to-use-gemini-api-in-r">How to Use Gemini API in R</a></li>
<li><a href="../llm/how-to-extract-data-with-llms-in-r">How to Extract Data with LLMs in R</a></li>
</ul>
</section>
<section id="sources" class="level2">
<h2 class="anchored" data-anchor-id="sources">Sources</h2>
<ul>
<li><a href="https://ollama.com">Ollama Official Website</a></li>
<li><a href="https://ellmer.tidyverse.org/reference/chat_ollama.html">ellmer chat_ollama() Reference</a></li>
<li><a href="https://cran.r-project.org/package=ollamar">ollamar Package</a></li>
</ul>


<!-- -->

</section>

 ]]></description>
  <category>llm</category>
  <category>ollama</category>
  <guid>https://rstats101.com/llm/how-to-run-local-llms-in-r.html</guid>
  <pubDate>Fri, 03 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/llm/how-to-run-local-llms-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to Use Claude API in R</title>
  <link>https://rstats101.com/llm/how-to-use-claude-api-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Claude is Anthropic’s flagship AI model and one of the most popular coding assistants available. It excels at writing, understanding, and debugging R code. The <a href="../llm/how-to-use-ellmer-in-r">ellmer package</a> provides a tidyverse-friendly way to use Claude’s API in R.</p>
<p><strong>Alternatives:</strong> See also <a href="../llm/how-to-use-openai-api-in-r">OpenAI API</a> or run models <a href="../llm/how-to-run-local-llms-in-r">locally with Ollama</a> for free.</p>
<p><strong>What you’ll learn:</strong> - Set up Claude API access - Use ellmer’s chat_claude() function - Create multi-turn conversations - Use tool calling for R functions - <a href="../llm/how-to-extract-data-with-llms-in-r">Extract structured data</a></p>
</section>
<section id="getting-started" class="level2">
<h2 class="anchored" data-anchor-id="getting-started">Getting Started</h2>
<section id="install-ellmer" class="level3">
<h3 class="anchored" data-anchor-id="install-ellmer">Install ellmer</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install.packages</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ellmer"</span>)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span></code></pre></div></div>
</section>
<section id="get-your-api-key" class="level3">
<h3 class="anchored" data-anchor-id="get-your-api-key">Get your API key</h3>
<ol type="1">
<li>Create an account at <a href="https://console.anthropic.com">console.anthropic.com</a></li>
<li>Go to API Keys section</li>
<li>Create a new key</li>
<li>Add billing information (required for API access)</li>
</ol>
<p><strong>Note:</strong> A Claude Pro subscription does NOT include API access. You need a separate developer account.</p>
</section>
<section id="set-your-api-key" class="level3">
<h3 class="anchored" data-anchor-id="set-your-api-key">Set your API key</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Option 1: Set for current session</span></span>
<span id="cb2-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.setenv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ANTHROPIC_API_KEY =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sk-ant-your-key-here"</span>)</span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Option 2: Add to .Renviron (recommended)</span></span>
<span id="cb2-5">usethis<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">edit_r_environ</span>()</span>
<span id="cb2-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add: ANTHROPIC_API_KEY=sk-ant-your-key-here</span></span></code></pre></div></div>
</section>
</section>
<section id="basic-chat" class="level2">
<h2 class="anchored" data-anchor-id="basic-chat">Basic Chat</h2>
<section id="create-a-chat-session" class="level3">
<h3 class="anchored" data-anchor-id="create-a-chat-session">Create a chat session</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span>
<span id="cb3-2"></span>
<span id="cb3-3">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb3-4">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What is R programming?"</span>)</span></code></pre></div></div>
</section>
<section id="single-question" class="level3">
<h3 class="anchored" data-anchor-id="single-question">Single question</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb4-2">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Explain what a data frame is in R."</span>)</span>
<span id="cb4-3">response</span></code></pre></div></div>
</section>
<section id="specify-model" class="level3">
<h3 class="anchored" data-anchor-id="specify-model">Specify model</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use specific Claude model</span></span>
<span id="cb5-2">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude-opus-4-6"</span>)</span></code></pre></div></div>
</section>
</section>
<section id="available-models" class="level2">
<h2 class="anchored" data-anchor-id="available-models">Available Models</h2>
<section id="latest-models-claude-4.6" class="level3">
<h3 class="anchored" data-anchor-id="latest-models-claude-4.6">Latest Models (Claude 4.6)</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Best For</th>
<th>Context</th>
<th>Max Output</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>claude-opus-4-6</code></td>
<td>Agents, complex coding</td>
<td>1M tokens</td>
<td>128k tokens</td>
</tr>
<tr class="even">
<td><code>claude-sonnet-4-6</code></td>
<td>Speed + intelligence</td>
<td>1M tokens</td>
<td>64k tokens</td>
</tr>
<tr class="odd">
<td><code>claude-haiku-4-5</code></td>
<td>Fastest, budget</td>
<td>200k tokens</td>
<td>64k tokens</td>
</tr>
</tbody>
</table>
</section>
<section id="legacy-models-still-available" class="level3">
<h3 class="anchored" data-anchor-id="legacy-models-still-available">Legacy Models (still available)</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Context</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>claude-opus-4-5</code></td>
<td>200k tokens</td>
</tr>
<tr class="even">
<td><code>claude-sonnet-4-5</code></td>
<td>200k tokens</td>
</tr>
<tr class="odd">
<td><code>claude-sonnet-4-0</code></td>
<td>200k tokens</td>
</tr>
</tbody>
</table>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Claude Opus 4.6 (most capable)</span></span>
<span id="cb6-2">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude-opus-4-6"</span>)</span>
<span id="cb6-3"></span>
<span id="cb6-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Claude Sonnet 4.6 (balanced, recommended)</span></span>
<span id="cb6-5">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude-sonnet-4-6"</span>)</span>
<span id="cb6-6"></span>
<span id="cb6-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Claude Haiku 4.5 (fastest, cheapest)</span></span>
<span id="cb6-8">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude-haiku-4-5"</span>)</span></code></pre></div></div>
<p><strong>Note:</strong> Claude 4.6 models support extended thinking and have 1M token context windows.</p>
</section>
</section>
<section id="system-prompts" class="level2">
<h2 class="anchored" data-anchor-id="system-prompts">System Prompts</h2>
<p>Control Claude’s behavior:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb7-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are an expert R programmer. Always provide working code examples. Be concise."</span></span>
<span id="cb7-3">)</span>
<span id="cb7-4"></span>
<span id="cb7-5">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How do I calculate the mean of a column?"</span>)</span></code></pre></div></div>
<section id="specialized-assistants" class="level3">
<h3 class="anchored" data-anchor-id="specialized-assistants">Specialized assistants</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Data analysis assistant</span></span>
<span id="cb8-2">analyst <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb8-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are a data analyst expert in R and the tidyverse.</span></span>
<span id="cb8-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  When asked questions, provide clear explanations with code examples using dplyr and ggplot2."</span></span>
<span id="cb8-5">)</span>
<span id="cb8-6"></span>
<span id="cb8-7">analyst<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How do I find outliers in my data?"</span>)</span></code></pre></div></div>
</section>
</section>
<section id="multi-turn-conversations" class="level2">
<h2 class="anchored" data-anchor-id="multi-turn-conversations">Multi-turn Conversations</h2>
<p>ellmer maintains conversation history automatically:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb9-2"></span>
<span id="cb9-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># First message</span></span>
<span id="cb9-4">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I have a dataset of customer purchases."</span>)</span>
<span id="cb9-5"></span>
<span id="cb9-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Follow-up (Claude remembers context)</span></span>
<span id="cb9-7">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How would I calculate total spending per customer?"</span>)</span>
<span id="cb9-8"></span>
<span id="cb9-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Another follow-up</span></span>
<span id="cb9-10">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Now how do I visualize this?"</span>)</span></code></pre></div></div>
<section id="view-conversation-history" class="level3">
<h3 class="anchored" data-anchor-id="view-conversation-history">View conversation history</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># See all turns</span></span>
<span id="cb10-2">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_turns</span>()</span></code></pre></div></div>
</section>
</section>
<section id="practical-examples" class="level2">
<h2 class="anchored" data-anchor-id="practical-examples">Practical Examples</h2>
<section id="generate-r-code" class="level3">
<h3 class="anchored" data-anchor-id="generate-r-code">Generate R code</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb11-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are an R expert. Return only executable R code with comments. No explanations outside code."</span></span>
<span id="cb11-3">)</span>
<span id="cb11-4"></span>
<span id="cb11-5">code <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb11-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Create a function that:</span></span>
<span id="cb11-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">1. Takes a data frame and column name</span></span>
<span id="cb11-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">2. Removes outliers (values beyond 1.5*IQR)</span></span>
<span id="cb11-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">3. Returns the cleaned data frame</span></span>
<span id="cb11-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb11-11"></span>
<span id="cb11-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(code)</span></code></pre></div></div>
</section>
<section id="explain-existing-code" class="level3">
<h3 class="anchored" data-anchor-id="explain-existing-code">Explain existing code</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb12-2"></span>
<span id="cb12-3">code_to_explain <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb12-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">mtcars |&gt;</span></span>
<span id="cb12-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  group_by(cyl) |&gt;</span></span>
<span id="cb12-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  summarise(across(where(is.numeric), mean)) |&gt;</span></span>
<span id="cb12-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  pivot_longer(-cyl)</span></span>
<span id="cb12-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb12-9"></span>
<span id="cb12-10">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Explain this R code step by step:"</span>, code_to_explain))</span></code></pre></div></div>
</section>
<section id="debug-errors" class="level3">
<h3 class="anchored" data-anchor-id="debug-errors">Debug errors</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb13-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are an R debugging expert. When shown errors, explain the cause and provide a fix."</span></span>
<span id="cb13-3">)</span>
<span id="cb13-4"></span>
<span id="cb13-5">error_message <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Error in select(df, name) : object 'name' not found"</span></span>
<span id="cb13-6"></span>
<span id="cb13-7">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I got this error:"</span>, error_message,</span>
<span id="cb13-8">                <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"My code was: df |&gt; select(name)"</span>))</span></code></pre></div></div>
</section>
<section id="analyze-data-descriptions" class="level3">
<h3 class="anchored" data-anchor-id="analyze-data-descriptions">Analyze data descriptions</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb14-2"></span>
<span id="cb14-3">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb14-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">I have a dataset with these columns:</span></span>
<span id="cb14-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- customer_id (integer)</span></span>
<span id="cb14-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- purchase_date (date)</span></span>
<span id="cb14-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- amount (numeric)</span></span>
<span id="cb14-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- category (character: 'electronics', 'clothing', 'food')</span></span>
<span id="cb14-9"></span>
<span id="cb14-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Suggest 5 interesting analyses I could perform.</span></span>
<span id="cb14-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
</section>
</section>
<section id="tool-calling" class="level2">
<h2 class="anchored" data-anchor-id="tool-calling">Tool Calling</h2>
<p>Tool calling lets Claude execute R functions to get real data. This is powerful for building AI agents.</p>
<section id="define-your-function" class="level3">
<h3 class="anchored" data-anchor-id="define-your-function">Define your function</h3>
<p>First, create a regular R function:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">get_weather <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(city) {</span>
<span id="cb15-2">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># In practice, this would call a weather API</span></span>
<span id="cb15-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Weather in"</span>, city, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">": 72°F, sunny"</span>)</span>
<span id="cb15-4">}</span></code></pre></div></div>
</section>
<section id="register-it-as-a-tool" class="level3">
<h3 class="anchored" data-anchor-id="register-it-as-a-tool">Register it as a tool</h3>
<p>Tell Claude about the function:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb16-2"></span>
<span id="cb16-3">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">register_tool</span>(</span>
<span id="cb16-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"get_weather"</span>,</span>
<span id="cb16-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Get current weather for a city"</span>,</span>
<span id="cb16-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">arguments =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb16-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">city =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tool_arg</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"string"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The city name"</span>)</span>
<span id="cb16-8">  ),</span>
<span id="cb16-9">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">func =</span> get_weather</span>
<span id="cb16-10">)</span></code></pre></div></div>
</section>
<section id="claude-calls-it-automatically" class="level3">
<h3 class="anchored" data-anchor-id="claude-calls-it-automatically">Claude calls it automatically</h3>
<p>When you ask about weather, Claude recognizes it should use the tool:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What's the weather in New York?"</span>)</span>
<span id="cb17-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Claude calls get_weather("New York") and returns the result</span></span></code></pre></div></div>
</section>
<section id="tool-for-data-analysis" class="level3">
<h3 class="anchored" data-anchor-id="tool-for-data-analysis">Tool for data analysis</h3>
<p>Create a function that analyzes data:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">analyze_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(column_name) {</span>
<span id="cb18-2">  data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mtcars[[column_name]]</span>
<span id="cb18-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(data), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(data), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">min =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(data), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">max =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(data))</span>
<span id="cb18-4">}</span></code></pre></div></div>
<p>Register and use it:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb19-2"></span>
<span id="cb19-3">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">register_tool</span>(</span>
<span id="cb19-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"analyze_data"</span>,</span>
<span id="cb19-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Get summary statistics for a column in mtcars"</span>,</span>
<span id="cb19-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">arguments =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">column_name =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tool_arg</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"string"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Column name"</span>)),</span>
<span id="cb19-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">func =</span> analyze_data</span>
<span id="cb19-8">)</span>
<span id="cb19-9"></span>
<span id="cb19-10">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What are the statistics for the mpg column?"</span>)</span></code></pre></div></div>
</section>
</section>
<section id="structured-output" class="level2">
<h2 class="anchored" data-anchor-id="structured-output">Structured Output</h2>
<p>Extract data in a specific format instead of free text. This is useful for parsing reviews, extracting entities, or classifying content. For more examples, see <a href="../llm/how-to-extract-data-with-llms-in-r">How to Extract Structured Data with LLMs</a>.</p>
<section id="define-a-schema" class="level3">
<h3 class="anchored" data-anchor-id="define-a-schema">Define a schema</h3>
<p>Specify the structure you want:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">review_schema <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb20-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentiment =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive, negative, or neutral"</span>),</span>
<span id="cb20-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">confidence =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_number</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"confidence score 0-1"</span>),</span>
<span id="cb20-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">summary =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"one sentence summary"</span>)</span>
<span id="cb20-5">)</span></code></pre></div></div>
</section>
<section id="extract-structured-data" class="level3">
<h3 class="anchored" data-anchor-id="extract-structured-data">Extract structured data</h3>
<p>Pass text and schema to get clean output:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb21-2"></span>
<span id="cb21-3">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(</span>
<span id="cb21-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"This product exceeded my expectations! Great quality and fast shipping."</span>,</span>
<span id="cb21-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> review_schema</span>
<span id="cb21-6">)</span></code></pre></div></div>
</section>
<section id="result-is-a-list" class="level3">
<h3 class="anchored" data-anchor-id="result-is-a-list">Result is a list</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>sentiment</span>
<span id="cb22-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "positive"</span></span>
<span id="cb22-3"></span>
<span id="cb22-4">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>confidence</span>
<span id="cb22-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 0.95</span></span></code></pre></div></div>
</section>
</section>
<section id="streaming-responses" class="level2">
<h2 class="anchored" data-anchor-id="streaming-responses">Streaming Responses</h2>
<p>For long responses, stream output:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb23-2"></span>
<span id="cb23-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Stream response (prints as it generates)</span></span>
<span id="cb23-4">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stream</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Write a detailed guide to ggplot2 themes."</span>)</span></code></pre></div></div>
</section>
<section id="error-handling" class="level2">
<h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1">safe_chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(prompt) {</span>
<span id="cb24-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tryCatch</span>({</span>
<span id="cb24-3">    chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb24-4">    chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(prompt)</span>
<span id="cb24-5">  }, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">error =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(e) {</span>
<span id="cb24-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">message</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"API Error: "</span>, e<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message)</span>
<span id="cb24-7">    <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span></span>
<span id="cb24-8">  })</span>
<span id="cb24-9">}</span>
<span id="cb24-10"></span>
<span id="cb24-11">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What is 2+2?"</span>)</span></code></pre></div></div>
<section id="common-errors" class="level3">
<h3 class="anchored" data-anchor-id="common-errors">Common errors</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Error</th>
<th>Cause</th>
<th>Solution</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>401 Unauthorized</td>
<td>Invalid API key</td>
<td>Check ANTHROPIC_API_KEY</td>
</tr>
<tr class="even">
<td>429 Rate limit</td>
<td>Too many requests</td>
<td>Add delays</td>
</tr>
<tr class="odd">
<td>Credit balance</td>
<td>No funds</td>
<td>Add billing</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="cost-management" class="level2">
<h2 class="anchored" data-anchor-id="cost-management">Cost Management</h2>
<section id="token-usage" class="level3">
<h3 class="anchored" data-anchor-id="token-usage">Token usage</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check usage after chat</span></span>
<span id="cb25-2">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb25-3">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hello!"</span>)</span>
<span id="cb25-4"></span>
<span id="cb25-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># View token counts</span></span>
<span id="cb25-6">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_turns</span>()</span></code></pre></div></div>
</section>
<section id="use-appropriate-models" class="level3">
<h3 class="anchored" data-anchor-id="use-appropriate-models">Use appropriate models</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use Haiku for simple tasks (cheapest)</span></span>
<span id="cb26-2">simple_chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude-haiku-4-5"</span>)</span>
<span id="cb26-3"></span>
<span id="cb26-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use Sonnet for coding (balanced, recommended)</span></span>
<span id="cb26-5">code_chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude-sonnet-4-6"</span>)</span>
<span id="cb26-6"></span>
<span id="cb26-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use Opus only for complex reasoning (most capable)</span></span>
<span id="cb26-8">complex_chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude-opus-4-6"</span>)</span></code></pre></div></div>
</section>
</section>
<section id="batch-processing" class="level2">
<h2 class="anchored" data-anchor-id="batch-processing">Batch Processing</h2>
<p>Process multiple texts with purrr’s <code>map()</code> functions.</p>
<section id="create-a-processing-function" class="level3">
<h3 class="anchored" data-anchor-id="create-a-processing-function">Create a processing function</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1">classify_sentiment <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text) {</span>
<span id="cb27-2">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb27-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Classify as positive/negative/neutral. One word only."</span></span>
<span id="cb27-4">  )</span>
<span id="cb27-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rate limiting - important!</span></span>
<span id="cb27-6">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(text)</span>
<span id="cb27-7">}</span></code></pre></div></div>
</section>
<section id="apply-to-multiple-texts" class="level3">
<h3 class="anchored" data-anchor-id="apply-to-multiple-texts">Apply to multiple texts</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(purrr)</span>
<span id="cb28-2"></span>
<span id="cb28-3">texts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Great product!"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Terrible service"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"It's okay I guess"</span>)</span>
<span id="cb28-4"></span>
<span id="cb28-5">results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(texts, classify_sentiment)</span>
<span id="cb28-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "positive", "negative", "neutral"</span></span></code></pre></div></div>
<p>The <code>Sys.sleep(0.5)</code> prevents hitting rate limits when processing many items.</p>
</section>
</section>
<section id="common-mistakes" class="level2">
<h2 class="anchored" data-anchor-id="common-mistakes">Common Mistakes</h2>
<p><strong>1. Using Claude Pro subscription for API</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Claude Pro (claude.ai) is NOT the same as API access</span></span>
<span id="cb29-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># You need a developer account at console.anthropic.com</span></span></code></pre></div></div>
<p><strong>2. Forgetting conversation state</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Each chat_claude() creates a NEW conversation</span></span>
<span id="cb30-2">chat1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb30-3">chat1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"My name is Alice"</span>)</span>
<span id="cb30-4"></span>
<span id="cb30-5">chat2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># New conversation!</span></span>
<span id="cb30-6">chat2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What's my name?"</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Claude doesn't know</span></span>
<span id="cb30-7"></span>
<span id="cb30-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Reuse same chat object for context</span></span>
<span id="cb30-9">chat1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What's my name?"</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "Alice"</span></span></code></pre></div></div>
<p><strong>3. Not setting appropriate system prompts</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generic (may give verbose responses)</span></span>
<span id="cb31-2">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb31-3"></span>
<span id="cb31-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Better (focused responses)</span></span>
<span id="cb31-5">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb31-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Be concise. Respond in 2-3 sentences max."</span></span>
<span id="cb31-7">)</span></code></pre></div></div>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Task</th>
<th>Code</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Basic chat</td>
<td><code>chat &lt;- chat_claude(); chat$chat("Hi")</code></td>
</tr>
<tr class="even">
<td>Set API key</td>
<td><code>Sys.setenv(ANTHROPIC_API_KEY = "key")</code></td>
</tr>
<tr class="odd">
<td>System prompt</td>
<td><code>chat_claude(system_prompt = "...")</code></td>
</tr>
<tr class="even">
<td>Specific model</td>
<td><code>chat_claude(model = "claude-opus-4-6")</code></td>
</tr>
<tr class="odd">
<td>Tool calling</td>
<td><code>chat$register_tool(...)</code></td>
</tr>
<tr class="even">
<td>Structured output</td>
<td><code>chat$extract_data(text, type)</code></td>
</tr>
</tbody>
</table>
<ul>
<li>Claude Sonnet is best for R code generation</li>
<li>Use system prompts to control response style</li>
<li>Reuse chat objects to maintain conversation context</li>
<li>Add delays when processing multiple items</li>
</ul>
</section>
<section id="related-posts" class="level2">
<h2 class="anchored" data-anchor-id="related-posts">Related Posts</h2>
<ul>
<li><a href="../llm/how-to-use-openai-api-in-r">How to Use OpenAI API in R</a></li>
<li><a href="../llm/how-to-use-gemini-api-in-r">How to Use Gemini API in R</a></li>
<li><a href="../llm/how-to-use-ellmer-in-r">How to Use ellmer in R</a></li>
<li><a href="../llm/how-to-run-local-llms-in-r">How to Run Local LLMs in R</a></li>
<li><a href="../llm/how-to-extract-data-with-llms-in-r">How to Extract Data with LLMs in R</a></li>
</ul>
</section>
<section id="sources" class="level2">
<h2 class="anchored" data-anchor-id="sources">Sources</h2>
<ul>
<li><a href="https://ellmer.tidyverse.org/">ellmer Package Documentation</a></li>
<li><a href="https://ellmer.tidyverse.org/reference/chat_claude.html">chat_claude() Reference</a></li>
<li><a href="https://datagrowth.io/blog/ai-agent-r-shiny-claude/">Datagrowth AI Agent Tutorial</a></li>
</ul>


<!-- -->

</section>

 ]]></description>
  <category>llm</category>
  <category>claude</category>
  <category>ellmer</category>
  <guid>https://rstats101.com/llm/how-to-use-claude-api-in-r.html</guid>
  <pubDate>Fri, 03 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/llm/how-to-use-claude-api-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to Use ellmer in R</title>
  <link>https://rstats101.com/llm/how-to-use-ellmer-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>ellmer is Posit’s official R package for working with Large Language Models. It provides a consistent, tidyverse-friendly interface that works with multiple LLM providers.</p>
<p><strong>Key features:</strong> - Consistent API across providers (<a href="../llm/how-to-use-openai-api-in-r">OpenAI</a>, <a href="../llm/how-to-use-claude-api-in-r">Claude</a>, <a href="../llm/how-to-run-local-llms-in-r">Ollama</a>, etc.) - Streaming output support - Tool/function calling - <a href="../llm/how-to-extract-data-with-llms-in-r">Structured data extraction</a> - Conversation history management</p>
</section>
<section id="getting-started" class="level2">
<h2 class="anchored" data-anchor-id="getting-started">Getting Started</h2>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install.packages</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ellmer"</span>)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span></code></pre></div></div>
</section>
<section id="supported-providers" class="level2">
<h2 class="anchored" data-anchor-id="supported-providers">Supported Providers</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Provider</th>
<th>Function</th>
<th>API Key Variable</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><a href="../llm/how-to-use-openai-api-in-r">OpenAI</a></td>
<td><code>chat_openai()</code></td>
<td><code>OPENAI_API_KEY</code></td>
</tr>
<tr class="even">
<td><a href="../llm/how-to-use-claude-api-in-r">Anthropic (Claude)</a></td>
<td><code>chat_claude()</code></td>
<td><code>ANTHROPIC_API_KEY</code></td>
</tr>
<tr class="odd">
<td><a href="../llm/how-to-use-gemini-api-in-r">Google (Gemini)</a></td>
<td><code>chat_gemini()</code></td>
<td><code>GOOGLE_API_KEY</code></td>
</tr>
<tr class="even">
<td><a href="../llm/how-to-run-local-llms-in-r">Ollama (Local)</a></td>
<td><code>chat_ollama()</code></td>
<td>None (local)</td>
</tr>
<tr class="odd">
<td>Azure OpenAI</td>
<td><code>chat_azure()</code></td>
<td><code>AZURE_OPENAI_API_KEY</code></td>
</tr>
<tr class="even">
<td>AWS Bedrock</td>
<td><code>chat_bedrock()</code></td>
<td>AWS credentials</td>
</tr>
</tbody>
</table>
</section>
<section id="basic-usage" class="level2">
<h2 class="anchored" data-anchor-id="basic-usage">Basic Usage</h2>
<section id="create-a-chat" class="level3">
<h3 class="anchored" data-anchor-id="create-a-chat">Create a chat</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># OpenAI</span></span>
<span id="cb2-2">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_openai</span>()</span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Claude</span></span>
<span id="cb2-5">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb2-6"></span>
<span id="cb2-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Local (Ollama)</span></span>
<span id="cb2-8">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>()</span></code></pre></div></div>
</section>
<section id="send-a-message" class="level3">
<h3 class="anchored" data-anchor-id="send-a-message">Send a message</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_openai</span>()</span>
<span id="cb3-2">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What is the tidyverse?"</span>)</span>
<span id="cb3-3">response</span></code></pre></div></div>
</section>
<section id="multi-turn-conversation" class="level3">
<h3 class="anchored" data-anchor-id="multi-turn-conversation">Multi-turn conversation</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb4-2"></span>
<span id="cb4-3">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I'm analyzing sales data."</span>)</span>
<span id="cb4-4">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How should I handle missing values?"</span>)</span>
<span id="cb4-5">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Show me code to do that."</span>)</span></code></pre></div></div>
</section>
</section>
<section id="system-prompts" class="level2">
<h2 class="anchored" data-anchor-id="system-prompts">System Prompts</h2>
<p>Control assistant behavior:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_openai</span>(</span>
<span id="cb5-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are an R programming expert. Always provide working code examples using tidyverse conventions."</span></span>
<span id="cb5-3">)</span>
<span id="cb5-4"></span>
<span id="cb5-5">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How do I join two data frames?"</span>)</span></code></pre></div></div>
<section id="role-specific-assistants" class="level3">
<h3 class="anchored" data-anchor-id="role-specific-assistants">Role-specific assistants</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Data analyst</span></span>
<span id="cb6-2">analyst <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb6-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are a senior data analyst. Explain concepts clearly and suggest best practices for data analysis in R."</span></span>
<span id="cb6-4">)</span>
<span id="cb6-5"></span>
<span id="cb6-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Code reviewer</span></span>
<span id="cb6-7">reviewer <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_openai</span>(</span>
<span id="cb6-8">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are a code reviewer. Analyze R code for bugs, inefficiencies, and style issues. Be constructive."</span></span>
<span id="cb6-9">)</span>
<span id="cb6-10"></span>
<span id="cb6-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Statistics tutor</span></span>
<span id="cb6-12">tutor <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb6-13">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are a statistics tutor. Explain statistical concepts in simple terms with R examples."</span></span>
<span id="cb6-14">)</span></code></pre></div></div>
</section>
</section>
<section id="switching-between-providers" class="level2">
<h2 class="anchored" data-anchor-id="switching-between-providers">Switching Between Providers</h2>
<p>Same code works across providers:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Function that works with any provider</span></span>
<span id="cb7-2">analyze_with_llm <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(chat, data_description) {</span>
<span id="cb7-3">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(</span>
<span id="cb7-4">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I have this data:"</span>,</span>
<span id="cb7-5">    data_description,</span>
<span id="cb7-6">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Suggest 3 analyses I could perform."</span></span>
<span id="cb7-7">  ))</span>
<span id="cb7-8">}</span>
<span id="cb7-9"></span>
<span id="cb7-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use with different providers</span></span>
<span id="cb7-11">openai_chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_openai</span>()</span>
<span id="cb7-12">claude_chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb7-13">local_chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_ollama</span>()</span>
<span id="cb7-14"></span>
<span id="cb7-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Same function, different backends</span></span>
<span id="cb7-16"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">analyze_with_llm</span>(openai_chat, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Customer purchase history"</span>)</span>
<span id="cb7-17"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">analyze_with_llm</span>(claude_chat, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Customer purchase history"</span>)</span>
<span id="cb7-18"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">analyze_with_llm</span>(local_chat, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Customer purchase history"</span>)</span></code></pre></div></div>
</section>
<section id="streaming-output" class="level2">
<h2 class="anchored" data-anchor-id="streaming-output">Streaming Output</h2>
<p>For long responses, stream as they generate:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb8-2"></span>
<span id="cb8-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Prints incrementally as response generates</span></span>
<span id="cb8-4">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stream</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Write a comprehensive guide to data visualization in R."</span>)</span></code></pre></div></div>
</section>
<section id="tool-calling" class="level2">
<h2 class="anchored" data-anchor-id="tool-calling">Tool Calling</h2>
<p>Tool calling lets LLMs execute R functions to get real data.</p>
<section id="define-a-function" class="level3">
<h3 class="anchored" data-anchor-id="define-a-function">Define a function</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">calculate_stats <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(numbers) {</span>
<span id="cb9-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(numbers), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(numbers), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">median =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">median</span>(numbers))</span>
<span id="cb9-3">}</span></code></pre></div></div>
</section>
<section id="register-it-as-a-tool" class="level3">
<h3 class="anchored" data-anchor-id="register-it-as-a-tool">Register it as a tool</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb10-2"></span>
<span id="cb10-3">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">register_tool</span>(</span>
<span id="cb10-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"calculate_stats"</span>,</span>
<span id="cb10-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Calculate summary statistics for a vector of numbers"</span>,</span>
<span id="cb10-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">arguments =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb10-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">numbers =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tool_arg</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"array"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A vector of numbers"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">items =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_number</span>())</span>
<span id="cb10-8">  ),</span>
<span id="cb10-9">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">func =</span> calculate_stats</span>
<span id="cb10-10">)</span>
<span id="cb10-11"></span>
<span id="cb10-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># LLM can now call this function</span></span>
<span id="cb10-13">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What are the statistics for the numbers 1, 5, 3, 9, 2, 7?"</span>)</span></code></pre></div></div>
</section>
<section id="multiple-tools" class="level3">
<h3 class="anchored" data-anchor-id="multiple-tools">Multiple tools</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Tool 1: Read data</span></span>
<span id="cb11-2">read_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(filename) {</span>
<span id="cb11-3">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">file.exists</span>(filename)) {</span>
<span id="cb11-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read.csv</span>(filename), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb11-5">  } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb11-6">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"File not found"</span></span>
<span id="cb11-7">  }</span>
<span id="cb11-8">}</span>
<span id="cb11-9"></span>
<span id="cb11-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Tool 2: Plot data</span></span>
<span id="cb11-11">create_plot <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x_col, y_col) {</span>
<span id="cb11-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Creating plot of"</span>, y_col, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"vs"</span>, x_col)</span>
<span id="cb11-13">}</span>
<span id="cb11-14"></span>
<span id="cb11-15">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_openai</span>()</span>
<span id="cb11-16"></span>
<span id="cb11-17">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">register_tool</span>(</span>
<span id="cb11-18">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"read_data"</span>,</span>
<span id="cb11-19">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read first 5 rows of a CSV file"</span>,</span>
<span id="cb11-20">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">arguments =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">filename =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tool_arg</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"string"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Path to CSV file"</span>)),</span>
<span id="cb11-21">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">func =</span> read_data</span>
<span id="cb11-22">)</span>
<span id="cb11-23"></span>
<span id="cb11-24">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">register_tool</span>(</span>
<span id="cb11-25">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"create_plot"</span>,</span>
<span id="cb11-26">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Create a scatter plot"</span>,</span>
<span id="cb11-27">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">arguments =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb11-28">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x_col =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tool_arg</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"string"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"X axis column"</span>),</span>
<span id="cb11-29">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y_col =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tool_arg</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"string"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Y axis column"</span>)</span>
<span id="cb11-30">  ),</span>
<span id="cb11-31">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">func =</span> create_plot</span>
<span id="cb11-32">)</span>
<span id="cb11-33"></span>
<span id="cb11-34">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Read data.csv and plot mpg vs wt"</span>)</span></code></pre></div></div>
</section>
</section>
<section id="structured-data-extraction" class="level2">
<h2 class="anchored" data-anchor-id="structured-data-extraction">Structured Data Extraction</h2>
<p>Extract data in specific formats. For a deep dive, see <a href="../llm/how-to-extract-data-with-llms-in-r">How to Extract Structured Data with LLMs</a>.</p>
<section id="simple-extraction" class="level3">
<h3 class="anchored" data-anchor-id="simple-extraction">Simple extraction</h3>
<p>Define a schema for the data you want:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">person_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb12-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Person's full name"</span>),</span>
<span id="cb12-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">age =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_integer</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Person's age"</span>),</span>
<span id="cb12-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">occupation =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Person's job"</span>)</span>
<span id="cb12-5">)</span></code></pre></div></div>
<p>Extract structured data from text:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb13-2"></span>
<span id="cb13-3">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(</span>
<span id="cb13-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"John Smith is a 35-year-old software engineer from Seattle."</span>,</span>
<span id="cb13-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> person_type</span>
<span id="cb13-6">)</span>
<span id="cb13-7"></span>
<span id="cb13-8">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>name  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "John Smith"</span></span>
<span id="cb13-9">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>age   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 35</span></span>
<span id="cb13-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $name: "John Smith"</span></span>
<span id="cb13-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $age: 35</span></span>
<span id="cb13-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $occupation: "software engineer"</span></span></code></pre></div></div>
</section>
<section id="extract-arrays" class="level3">
<h3 class="anchored" data-anchor-id="extract-arrays">Extract arrays</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Schema for multiple items</span></span>
<span id="cb14-2">products_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_array</span>(</span>
<span id="cb14-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">items =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb14-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Product name"</span>),</span>
<span id="cb14-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">price =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_number</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Price in dollars"</span>),</span>
<span id="cb14-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">category =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Product category"</span>)</span>
<span id="cb14-7">  )</span>
<span id="cb14-8">)</span>
<span id="cb14-9"></span>
<span id="cb14-10">text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"We have laptops at $999, headphones for $199, and keyboards at $79."</span></span>
<span id="cb14-11"></span>
<span id="cb14-12">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_openai</span>()</span>
<span id="cb14-13">products <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(text, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> products_type)</span></code></pre></div></div>
</section>
<section id="classification" class="level3">
<h3 class="anchored" data-anchor-id="classification">Classification</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">sentiment_type <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_enum</span>(</span>
<span id="cb15-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"neutral"</span>),</span>
<span id="cb15-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">description =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The sentiment of the text"</span></span>
<span id="cb15-4">)</span>
<span id="cb15-5"></span>
<span id="cb15-6">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb15-7">sentiment <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(</span>
<span id="cb15-8">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"This product is amazing!"</span>,</span>
<span id="cb15-9">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> sentiment_type</span>
<span id="cb15-10">)</span>
<span id="cb15-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "positive"</span></span></code></pre></div></div>
</section>
</section>
<section id="conversation-management" class="level2">
<h2 class="anchored" data-anchor-id="conversation-management">Conversation Management</h2>
<section id="view-history" class="level3">
<h3 class="anchored" data-anchor-id="view-history">View history</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb16-2">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hello!"</span>)</span>
<span id="cb16-3">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What's 2+2?"</span>)</span>
<span id="cb16-4"></span>
<span id="cb16-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get all turns</span></span>
<span id="cb16-6">turns <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_turns</span>()</span></code></pre></div></div>
</section>
<section id="clear-history" class="level3">
<h3 class="anchored" data-anchor-id="clear-history">Clear history</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Start fresh</span></span>
<span id="cb17-2">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">clear</span>()</span></code></pre></div></div>
</section>
<section id="save-and-restore" class="level3">
<h3 class="anchored" data-anchor-id="save-and-restore">Save and restore</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Save conversation</span></span>
<span id="cb18-2">turns <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_turns</span>()</span>
<span id="cb18-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">saveRDS</span>(turns, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"conversation.rds"</span>)</span>
<span id="cb18-4"></span>
<span id="cb18-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Restore later</span></span>
<span id="cb18-6">saved_turns <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">readRDS</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"conversation.rds"</span>)</span>
<span id="cb18-7">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">turns =</span> saved_turns)</span></code></pre></div></div>
</section>
</section>
<section id="practical-examples" class="level2">
<h2 class="anchored" data-anchor-id="practical-examples">Practical Examples</h2>
<section id="code-generation-pipeline" class="level3">
<h3 class="anchored" data-anchor-id="code-generation-pipeline">Code generation pipeline</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">generate_and_run <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(task) {</span>
<span id="cb19-2">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(</span>
<span id="cb19-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Return only valid R code. No explanations. No markdown."</span></span>
<span id="cb19-4">  )</span>
<span id="cb19-5"></span>
<span id="cb19-6">  code <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(task)</span>
<span id="cb19-7"></span>
<span id="cb19-8">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Try to run the code</span></span>
<span id="cb19-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tryCatch</span>({</span>
<span id="cb19-10">    result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">eval</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">parse</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> code))</span>
<span id="cb19-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">code =</span> code, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">result =</span> result, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">success =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb19-12">  }, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">error =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(e) {</span>
<span id="cb19-13">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">code =</span> code, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">error =</span> e<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">success =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb19-14">  })</span>
<span id="cb19-15">}</span>
<span id="cb19-16"></span>
<span id="cb19-17"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">generate_and_run</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Calculate the mean of mtcars$mpg"</span>)</span></code></pre></div></div>
</section>
<section id="data-summarization" class="level3">
<h3 class="anchored" data-anchor-id="data-summarization">Data summarization</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">summarize_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(df, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">chat =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()) {</span>
<span id="cb20-2">  description <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(</span>
<span id="cb20-3">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Dataset with"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(df), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rows and"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ncol</span>(df), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"columns."</span>,</span>
<span id="cb20-4">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Columns:"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(df), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">collapse =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">", "</span>),</span>
<span id="cb20-5">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"First few values:"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">capture.output</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(df, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span>
<span id="cb20-6">  )</span>
<span id="cb20-7"></span>
<span id="cb20-8">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(</span>
<span id="cb20-9">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Here's a dataset:"</span>,</span>
<span id="cb20-10">    description,</span>
<span id="cb20-11">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Provide a brief summary of what this data contains."</span></span>
<span id="cb20-12">  ))</span>
<span id="cb20-13">}</span>
<span id="cb20-14"></span>
<span id="cb20-15"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize_data</span>(mtcars)</span></code></pre></div></div>
</section>
<section id="batch-processing-with-rate-limiting" class="level3">
<h3 class="anchored" data-anchor-id="batch-processing-with-rate-limiting">Batch processing with rate limiting</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(purrr)</span>
<span id="cb21-2"></span>
<span id="cb21-3">process_texts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(texts, system_prompt, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">delay =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>) {</span>
<span id="cb21-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(texts, \(text) {</span>
<span id="cb21-5">    chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> system_prompt)</span>
<span id="cb21-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(delay)</span>
<span id="cb21-7">    chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(text)</span>
<span id="cb21-8">  })</span>
<span id="cb21-9">}</span>
<span id="cb21-10"></span>
<span id="cb21-11">reviews <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(</span>
<span id="cb21-12">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Great product, highly recommend!"</span>,</span>
<span id="cb21-13">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Terrible, waste of money"</span>,</span>
<span id="cb21-14">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"It's okay, nothing special"</span></span>
<span id="cb21-15">)</span>
<span id="cb21-16"></span>
<span id="cb21-17">sentiments <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">process_texts</span>(</span>
<span id="cb21-18">  reviews,</span>
<span id="cb21-19">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Classify as positive/negative/neutral. One word."</span></span>
<span id="cb21-20">)</span></code></pre></div></div>
</section>
</section>
<section id="error-handling" class="level2">
<h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">safe_chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(prompt, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">provider =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude"</span>) {</span>
<span id="cb22-2">  chat_fn <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">switch</span>(provider,</span>
<span id="cb22-3">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"claude"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> chat_claude,</span>
<span id="cb22-4">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"openai"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> chat_openai,</span>
<span id="cb22-5">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ollama"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> chat_ollama</span>
<span id="cb22-6">  )</span>
<span id="cb22-7"></span>
<span id="cb22-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tryCatch</span>({</span>
<span id="cb22-9">    chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_fn</span>()</span>
<span id="cb22-10">    chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(prompt)</span>
<span id="cb22-11">  }, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">error =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(e) {</span>
<span id="cb22-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">warning</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"LLM Error: "</span>, e<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message)</span>
<span id="cb22-13">    <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA_character_</span></span>
<span id="cb22-14">  })</span>
<span id="cb22-15">}</span></code></pre></div></div>
</section>
<section id="common-mistakes" class="level2">
<h2 class="anchored" data-anchor-id="common-mistakes">Common Mistakes</h2>
<p><strong>1. Creating new chat for each message</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Wrong - loses context</span></span>
<span id="cb23-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"My name is Bob"</span>)</span>
<span id="cb23-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What's my name?"</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Doesn't know!</span></span>
<span id="cb23-4"></span>
<span id="cb23-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Right - reuse chat object</span></span>
<span id="cb23-6">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_claude</span>()</span>
<span id="cb23-7">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"My name is Bob"</span>)</span>
<span id="cb23-8">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What's my name?"</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "Bob"</span></span></code></pre></div></div>
<p><strong>2. Forgetting to set API keys</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check if keys are set</span></span>
<span id="cb24-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.getenv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"OPENAI_API_KEY"</span>)</span>
<span id="cb24-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.getenv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ANTHROPIC_API_KEY"</span>)</span>
<span id="cb24-4"></span>
<span id="cb24-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Set them</span></span>
<span id="cb24-6">usethis<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">edit_r_environ</span>()</span>
<span id="cb24-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add: OPENAI_API_KEY=your-key</span></span>
<span id="cb24-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add: ANTHROPIC_API_KEY=your-key</span></span></code></pre></div></div>
<p><strong>3. Not handling rate limits</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add delays in loops</span></span>
<span id="cb25-2">results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(items, \(item) {</span>
<span id="cb25-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Delay</span></span>
<span id="cb25-4">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(item)</span>
<span id="cb25-5">})</span></code></pre></div></div>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Task</th>
<th>Code</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>OpenAI chat</td>
<td><code>chat_openai()</code></td>
</tr>
<tr class="even">
<td>Claude chat</td>
<td><code>chat_claude()</code></td>
</tr>
<tr class="odd">
<td>Local Ollama</td>
<td><code>chat_ollama()</code></td>
</tr>
<tr class="even">
<td>System prompt</td>
<td><code>chat_*(system_prompt = "...")</code></td>
</tr>
<tr class="odd">
<td>Send message</td>
<td><code>chat$chat("message")</code></td>
</tr>
<tr class="even">
<td>Stream output</td>
<td><code>chat$stream("message")</code></td>
</tr>
<tr class="odd">
<td>Register tool</td>
<td><code>chat$register_tool(...)</code></td>
</tr>
<tr class="even">
<td>Extract data</td>
<td><code>chat$extract_data(text, type)</code></td>
</tr>
<tr class="odd">
<td>View history</td>
<td><code>chat$get_turns()</code></td>
</tr>
</tbody>
</table>
<ul>
<li>ellmer provides consistent syntax across providers</li>
<li>Use system prompts to control response style</li>
<li>Reuse chat objects to maintain context</li>
<li>Use tool calling for R function integration</li>
<li>Use structured extraction for reliable data parsing</li>
</ul>
</section>
<section id="related-posts" class="level2">
<h2 class="anchored" data-anchor-id="related-posts">Related Posts</h2>
<ul>
<li><a href="../llm/how-to-use-openai-api-in-r">How to Use OpenAI API in R</a></li>
<li><a href="../llm/how-to-use-claude-api-in-r">How to Use Claude API in R</a></li>
<li><a href="../llm/how-to-use-gemini-api-in-r">How to Use Gemini API in R</a></li>
<li><a href="../llm/how-to-run-local-llms-in-r">How to Run Local LLMs in R</a></li>
<li><a href="../llm/how-to-extract-data-with-llms-in-r">How to Extract Data with LLMs in R</a></li>
</ul>
</section>
<section id="sources" class="level2">
<h2 class="anchored" data-anchor-id="sources">Sources</h2>
<ul>
<li><a href="https://ellmer.tidyverse.org/">ellmer Documentation</a></li>
<li><a href="https://cran.r-project.org/package=ellmer">ellmer CRAN Package</a></li>
</ul>


<!-- -->

</section>

 ]]></description>
  <category>llm</category>
  <category>ellmer</category>
  <guid>https://rstats101.com/llm/how-to-use-ellmer-in-r.html</guid>
  <pubDate>Fri, 03 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/llm/how-to-use-ellmer-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to Use Gemini API in R</title>
  <link>https://rstats101.com/llm/how-to-use-gemini-api-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Gemini is Google’s most capable AI model. It excels at reasoning, coding, and multimodal tasks (text + images). The <a href="../llm/how-to-use-ellmer-in-r">ellmer package</a> provides a tidyverse-friendly way to use Gemini’s API in R.</p>
<p><strong>Alternatives:</strong> See also <a href="../llm/how-to-use-openai-api-in-r">OpenAI API</a>, <a href="../llm/how-to-use-claude-api-in-r">Claude API</a>, or run models <a href="../llm/how-to-run-local-llms-in-r">locally with Ollama</a> for free.</p>
<p><strong>What you’ll learn:</strong></p>
<ul>
<li>Set up Gemini API access</li>
<li>Use ellmer’s chat_gemini() function</li>
<li>Create multi-turn conversations</li>
<li>Use structured data extraction</li>
<li>Compare Gemini models</li>
</ul>
</section>
<section id="getting-started" class="level2">
<h2 class="anchored" data-anchor-id="getting-started">Getting Started</h2>
<section id="install-ellmer" class="level3">
<h3 class="anchored" data-anchor-id="install-ellmer">Install ellmer</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install.packages</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ellmer"</span>)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span></code></pre></div></div>
</section>
<section id="get-your-api-key" class="level3">
<h3 class="anchored" data-anchor-id="get-your-api-key">Get your API key</h3>
<ol type="1">
<li>Go to <a href="https://aistudio.google.com/">Google AI Studio</a></li>
<li>Sign in with your Google account</li>
<li>Click “Get API key” in the left sidebar</li>
<li>Create a new API key</li>
</ol>
<p><strong>Note:</strong> Gemini API has a generous free tier for experimentation.</p>
</section>
<section id="set-your-api-key" class="level3">
<h3 class="anchored" data-anchor-id="set-your-api-key">Set your API key</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Option 1: Set for current session</span></span>
<span id="cb2-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.setenv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">GOOGLE_API_KEY =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"your-api-key-here"</span>)</span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Option 2: Add to .Renviron (recommended)</span></span>
<span id="cb2-5">usethis<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">edit_r_environ</span>()</span>
<span id="cb2-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add: GOOGLE_API_KEY=your-api-key-here</span></span></code></pre></div></div>
</section>
</section>
<section id="basic-chat" class="level2">
<h2 class="anchored" data-anchor-id="basic-chat">Basic Chat</h2>
<section id="create-a-chat-session" class="level3">
<h3 class="anchored" data-anchor-id="create-a-chat-session">Create a chat session</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span>
<span id="cb3-2"></span>
<span id="cb3-3">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>()</span>
<span id="cb3-4">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What is R programming?"</span>)</span></code></pre></div></div>
</section>
<section id="single-question" class="level3">
<h3 class="anchored" data-anchor-id="single-question">Single question</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>()</span>
<span id="cb4-2">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Explain what a data frame is in R."</span>)</span>
<span id="cb4-3">response</span></code></pre></div></div>
</section>
<section id="specify-model" class="level3">
<h3 class="anchored" data-anchor-id="specify-model">Specify model</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use specific Gemini model</span></span>
<span id="cb5-2">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gemini-2.5-pro"</span>)</span></code></pre></div></div>
</section>
</section>
<section id="available-models" class="level2">
<h2 class="anchored" data-anchor-id="available-models">Available Models</h2>
<section id="current-stable-models" class="level3">
<h3 class="anchored" data-anchor-id="current-stable-models">Current Stable Models</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Best For</th>
<th>Context</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>gemini-2.5-flash</code></td>
<td>Best price-performance, reasoning</td>
<td>1M tokens</td>
</tr>
<tr class="even">
<td><code>gemini-2.5-pro</code></td>
<td>Complex tasks, deep reasoning, coding</td>
<td>1M tokens</td>
</tr>
<tr class="odd">
<td><code>gemini-2.5-flash-lite</code></td>
<td>Fastest, most budget-friendly</td>
<td>1M tokens</td>
</tr>
</tbody>
</table>
</section>
<section id="latest-preview-models" class="level3">
<h3 class="anchored" data-anchor-id="latest-preview-models">Latest Preview Models</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 26%">
<col style="width: 38%">
<col style="width: 34%">
</colgroup>
<thead>
<tr class="header">
<th>Model</th>
<th>Best For</th>
<th>Context</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>gemini-3.1-pro-preview</code></td>
<td>Advanced intelligence, agentic coding</td>
<td>1M tokens</td>
</tr>
<tr class="even">
<td><code>gemini-3-flash-preview</code></td>
<td>Frontier performance at low cost</td>
<td>1M tokens</td>
</tr>
<tr class="odd">
<td><code>gemini-3.1-flash-lite-preview</code></td>
<td>Most economical newest model</td>
<td>1M tokens</td>
</tr>
</tbody>
</table>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Gemini 2.5 Flash (recommended default)</span></span>
<span id="cb6-2">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gemini-2.5-flash"</span>)</span>
<span id="cb6-3"></span>
<span id="cb6-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Gemini 2.5 Pro (most capable stable)</span></span>
<span id="cb6-5">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gemini-2.5-pro"</span>)</span>
<span id="cb6-6"></span>
<span id="cb6-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Gemini 3.1 Pro Preview (cutting edge)</span></span>
<span id="cb6-8">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gemini-3.1-pro-preview"</span>)</span></code></pre></div></div>
<p><strong>Note:</strong> Gemini models support up to 1M token context windows, ideal for analyzing long documents. Use <code>-preview</code> models for the latest capabilities.</p>
</section>
</section>
<section id="system-prompts" class="level2">
<h2 class="anchored" data-anchor-id="system-prompts">System Prompts</h2>
<p>Control Gemini’s behavior:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>(</span>
<span id="cb7-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are an expert R programmer. Always provide working code examples. Be concise."</span></span>
<span id="cb7-3">)</span>
<span id="cb7-4"></span>
<span id="cb7-5">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How do I calculate the mean of a column?"</span>)</span></code></pre></div></div>
<section id="specialized-assistants" class="level3">
<h3 class="anchored" data-anchor-id="specialized-assistants">Specialized assistants</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Data analysis assistant</span></span>
<span id="cb8-2">analyst <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>(</span>
<span id="cb8-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are a data analyst expert in R and the tidyverse.</span></span>
<span id="cb8-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  When asked questions, provide clear explanations with code examples."</span></span>
<span id="cb8-5">)</span>
<span id="cb8-6"></span>
<span id="cb8-7">analyst<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How do I find outliers in my data?"</span>)</span></code></pre></div></div>
</section>
</section>
<section id="multi-turn-conversations" class="level2">
<h2 class="anchored" data-anchor-id="multi-turn-conversations">Multi-turn Conversations</h2>
<p>ellmer maintains conversation history automatically:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>()</span>
<span id="cb9-2"></span>
<span id="cb9-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># First message</span></span>
<span id="cb9-4">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I have a dataset of customer purchases."</span>)</span>
<span id="cb9-5"></span>
<span id="cb9-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Follow-up (Gemini remembers context)</span></span>
<span id="cb9-7">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How would I calculate total spending per customer?"</span>)</span>
<span id="cb9-8"></span>
<span id="cb9-9"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Another follow-up</span></span>
<span id="cb9-10">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Now how do I visualize this?"</span>)</span></code></pre></div></div>
<section id="view-conversation-history" class="level3">
<h3 class="anchored" data-anchor-id="view-conversation-history">View conversation history</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># See all turns</span></span>
<span id="cb10-2">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get_turns</span>()</span></code></pre></div></div>
</section>
</section>
<section id="practical-examples" class="level2">
<h2 class="anchored" data-anchor-id="practical-examples">Practical Examples</h2>
<section id="generate-r-code" class="level3">
<h3 class="anchored" data-anchor-id="generate-r-code">Generate R code</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>(</span>
<span id="cb11-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are an R expert. Return only executable R code with comments. No explanations outside code."</span></span>
<span id="cb11-3">)</span>
<span id="cb11-4"></span>
<span id="cb11-5">code <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb11-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Create a function that:</span></span>
<span id="cb11-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">1. Takes a data frame and column name</span></span>
<span id="cb11-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">2. Removes outliers (values beyond 1.5*IQR)</span></span>
<span id="cb11-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">3. Returns the cleaned data frame</span></span>
<span id="cb11-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb11-11"></span>
<span id="cb11-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(code)</span></code></pre></div></div>
</section>
<section id="explain-existing-code" class="level3">
<h3 class="anchored" data-anchor-id="explain-existing-code">Explain existing code</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>()</span>
<span id="cb12-2"></span>
<span id="cb12-3">code_to_explain <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb12-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">mtcars |&gt;</span></span>
<span id="cb12-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  group_by(cyl) |&gt;</span></span>
<span id="cb12-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  summarise(across(where(is.numeric), mean)) |&gt;</span></span>
<span id="cb12-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">  pivot_longer(-cyl)</span></span>
<span id="cb12-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb12-9"></span>
<span id="cb12-10">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Explain this R code step by step:"</span>, code_to_explain))</span></code></pre></div></div>
</section>
<section id="debug-errors" class="level3">
<h3 class="anchored" data-anchor-id="debug-errors">Debug errors</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>(</span>
<span id="cb13-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are an R debugging expert. When shown errors, explain the cause and provide a fix."</span></span>
<span id="cb13-3">)</span>
<span id="cb13-4"></span>
<span id="cb13-5">error_message <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Error in select(df, name) : object 'name' not found"</span></span>
<span id="cb13-6"></span>
<span id="cb13-7">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"I got this error:"</span>, error_message,</span>
<span id="cb13-8">                <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"My code was: df |&gt; select(name)"</span>))</span></code></pre></div></div>
</section>
</section>
<section id="structured-output" class="level2">
<h2 class="anchored" data-anchor-id="structured-output">Structured Output</h2>
<p>Extract data in a specific format. For more examples, see <a href="../llm/how-to-extract-data-with-llms-in-r">How to Extract Structured Data with LLMs</a>.</p>
<section id="define-a-schema" class="level3">
<h3 class="anchored" data-anchor-id="define-a-schema">Define a schema</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">review_schema <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_object</span>(</span>
<span id="cb14-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sentiment =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive, negative, or neutral"</span>),</span>
<span id="cb14-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">confidence =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_number</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"confidence score 0-1"</span>),</span>
<span id="cb14-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">summary =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">type_string</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"one sentence summary"</span>)</span>
<span id="cb14-5">)</span></code></pre></div></div>
</section>
<section id="extract-structured-data" class="level3">
<h3 class="anchored" data-anchor-id="extract-structured-data">Extract structured data</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>()</span>
<span id="cb15-2"></span>
<span id="cb15-3">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_data</span>(</span>
<span id="cb15-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"This product exceeded my expectations! Great quality and fast shipping."</span>,</span>
<span id="cb15-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> review_schema</span>
<span id="cb15-6">)</span></code></pre></div></div>
</section>
<section id="access-results" class="level3">
<h3 class="anchored" data-anchor-id="access-results">Access results</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>sentiment</span>
<span id="cb16-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "positive"</span></span>
<span id="cb16-3"></span>
<span id="cb16-4">result<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>confidence</span>
<span id="cb16-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 0.95</span></span></code></pre></div></div>
</section>
</section>
<section id="streaming-responses" class="level2">
<h2 class="anchored" data-anchor-id="streaming-responses">Streaming Responses</h2>
<p>For long responses, stream output:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>()</span>
<span id="cb17-2"></span>
<span id="cb17-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Stream response (prints as it generates)</span></span>
<span id="cb17-4">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stream</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Write a detailed guide to ggplot2 themes."</span>)</span></code></pre></div></div>
</section>
<section id="error-handling" class="level2">
<h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">safe_chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(prompt) {</span>
<span id="cb18-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tryCatch</span>({</span>
<span id="cb18-3">    chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>()</span>
<span id="cb18-4">    chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(prompt)</span>
<span id="cb18-5">  }, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">error =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(e) {</span>
<span id="cb18-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">message</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"API Error: "</span>, e<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message)</span>
<span id="cb18-7">    <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span></span>
<span id="cb18-8">  })</span>
<span id="cb18-9">}</span>
<span id="cb18-10"></span>
<span id="cb18-11">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What is 2+2?"</span>)</span></code></pre></div></div>
<section id="common-errors" class="level3">
<h3 class="anchored" data-anchor-id="common-errors">Common errors</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Error</th>
<th>Cause</th>
<th>Solution</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>401 Unauthorized</td>
<td>Invalid API key</td>
<td>Check GOOGLE_API_KEY</td>
</tr>
<tr class="even">
<td>429 Rate limit</td>
<td>Too many requests</td>
<td>Add delays</td>
</tr>
<tr class="odd">
<td>400 Bad request</td>
<td>Invalid parameters</td>
<td>Check model name</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="gemini-vs-other-providers" class="level2">
<h2 class="anchored" data-anchor-id="gemini-vs-other-providers">Gemini vs Other Providers</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Feature</th>
<th>Gemini</th>
<th>OpenAI</th>
<th>Claude</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Context window</td>
<td>1M tokens</td>
<td>128k tokens</td>
<td>1M tokens</td>
</tr>
<tr class="even">
<td>Free tier</td>
<td>Yes</td>
<td>No</td>
<td>No</td>
</tr>
<tr class="odd">
<td>Multimodal</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr class="even">
<td>Code quality</td>
<td>Good</td>
<td>Excellent</td>
<td>Excellent</td>
</tr>
<tr class="odd">
<td>Speed</td>
<td>Fast</td>
<td>Fast</td>
<td>Medium</td>
</tr>
<tr class="even">
<td>Latest models</td>
<td>Gemini 3.1</td>
<td>GPT-5.4 / o3</td>
<td>Claude 4.6</td>
</tr>
</tbody>
</table>
<p><strong>Choose Gemini when:</strong></p>
<ul>
<li>You need to process very long documents (1M tokens)</li>
<li>You want a free tier for experimentation</li>
<li>You’re already in the Google ecosystem</li>
<li>You need the latest preview models</li>
</ul>
</section>
<section id="batch-processing" class="level2">
<h2 class="anchored" data-anchor-id="batch-processing">Batch Processing</h2>
<p>Process multiple texts with rate limiting:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(purrr)</span>
<span id="cb19-2"></span>
<span id="cb19-3">classify_text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text) {</span>
<span id="cb19-4">  chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>(</span>
<span id="cb19-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">system_prompt =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Classify as positive/negative/neutral. One word only."</span></span>
<span id="cb19-6">  )</span>
<span id="cb19-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rate limiting</span></span>
<span id="cb19-8">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(text)</span>
<span id="cb19-9">}</span>
<span id="cb19-10"></span>
<span id="cb19-11">texts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Great product!"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Terrible service"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"It's okay I guess"</span>)</span>
<span id="cb19-12"></span>
<span id="cb19-13">results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_chr</span>(texts, classify_text)</span>
<span id="cb19-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "positive", "negative", "neutral"</span></span></code></pre></div></div>
</section>
<section id="common-mistakes" class="level2">
<h2 class="anchored" data-anchor-id="common-mistakes">Common Mistakes</h2>
<p><strong>1. Using wrong API key variable</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Wrong</span></span>
<span id="cb20-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.setenv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">GEMINI_API_KEY =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"key"</span>)</span>
<span id="cb20-3"></span>
<span id="cb20-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Right</span></span>
<span id="cb20-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.setenv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">GOOGLE_API_KEY =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"key"</span>)</span></code></pre></div></div>
<p><strong>2. Forgetting conversation state</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Each chat_gemini() creates a NEW conversation</span></span>
<span id="cb21-2">chat1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>()</span>
<span id="cb21-3">chat1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"My name is Alice"</span>)</span>
<span id="cb21-4"></span>
<span id="cb21-5">chat2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_gemini</span>()  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># New conversation!</span></span>
<span id="cb21-6">chat2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What's my name?"</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Gemini doesn't know</span></span>
<span id="cb21-7"></span>
<span id="cb21-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Reuse same chat object for context</span></span>
<span id="cb21-9">chat1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What's my name?"</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "Alice"</span></span></code></pre></div></div>
<p><strong>3. Not handling rate limits</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add delays in loops</span></span>
<span id="cb22-2">results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(items, \(item) {</span>
<span id="cb22-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Important!</span></span>
<span id="cb22-4">  chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(item)</span>
<span id="cb22-5">})</span></code></pre></div></div>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Task</th>
<th>Code</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Basic chat</td>
<td><code>chat &lt;- chat_gemini(); chat$chat("Hi")</code></td>
</tr>
<tr class="even">
<td>Set API key</td>
<td><code>Sys.setenv(GOOGLE_API_KEY = "key")</code></td>
</tr>
<tr class="odd">
<td>System prompt</td>
<td><code>chat_gemini(system_prompt = "...")</code></td>
</tr>
<tr class="even">
<td>Stable model</td>
<td><code>chat_gemini(model = "gemini-2.5-flash")</code></td>
</tr>
<tr class="odd">
<td>Latest model</td>
<td><code>chat_gemini(model = "gemini-3.1-pro-preview")</code></td>
</tr>
<tr class="even">
<td>Structured output</td>
<td><code>chat$extract_data(text, type)</code></td>
</tr>
</tbody>
</table>
<ul>
<li>Gemini 2.5 models are stable with 1M token context</li>
<li>Gemini 3.x preview models offer cutting-edge capabilities</li>
<li>Free tier available for experimentation</li>
<li>Use system prompts to control response style</li>
</ul>
</section>
<section id="related-posts" class="level2">
<h2 class="anchored" data-anchor-id="related-posts">Related Posts</h2>
<ul>
<li><a href="../llm/how-to-use-ellmer-in-r">How to Use ellmer in R</a></li>
<li><a href="../llm/how-to-use-openai-api-in-r">How to Use OpenAI API in R</a></li>
<li><a href="../llm/how-to-use-claude-api-in-r">How to Use Claude API in R</a></li>
<li><a href="../llm/how-to-run-local-llms-in-r">How to Run Local LLMs in R</a></li>
<li><a href="../llm/how-to-extract-data-with-llms-in-r">How to Extract Data with LLMs in R</a></li>
</ul>
</section>
<section id="sources" class="level2">
<h2 class="anchored" data-anchor-id="sources">Sources</h2>
<ul>
<li><a href="https://ellmer.tidyverse.org/">ellmer Package Documentation</a></li>
<li><a href="https://aistudio.google.com/">Google AI Studio</a></li>
<li><a href="https://ai.google.dev/docs">Gemini API Documentation</a></li>
</ul>


<!-- -->

</section>

 ]]></description>
  <category>llm</category>
  <category>gemini</category>
  <category>ellmer</category>
  <guid>https://rstats101.com/llm/how-to-use-gemini-api-in-r.html</guid>
  <pubDate>Fri, 03 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/llm/how-to-use-gemini-api-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to Use OpenAI API in R</title>
  <link>https://rstats101.com/llm/how-to-use-openai-api-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>OpenAI’s API gives you access to powerful language models like GPT-4 and GPT-4o directly from R. You can generate text, answer questions, analyze data, and build AI-powered applications.</p>
<p><strong>Alternatives:</strong> If you prefer <a href="../llm/how-to-use-claude-api-in-r">Claude</a> or want to run models <a href="../llm/how-to-run-local-llms-in-r">locally with Ollama</a>, the <a href="../llm/how-to-use-ellmer-in-r">ellmer package</a> provides a unified interface for all providers.</p>
<p><strong>What you’ll learn:</strong> - Set up OpenAI API access - Use the openai R package - Create chat completions - Handle responses and errors - Manage costs with token limits</p>
</section>
<section id="getting-started" class="level2">
<h2 class="anchored" data-anchor-id="getting-started">Getting Started</h2>
<section id="install-the-openai-package" class="level3">
<h3 class="anchored" data-anchor-id="install-the-openai-package">Install the openai package</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install.packages</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"openai"</span>)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(openai)</span></code></pre></div></div>
</section>
<section id="get-your-api-key" class="level3">
<h3 class="anchored" data-anchor-id="get-your-api-key">Get your API key</h3>
<ol type="1">
<li>Create an account at <a href="https://platform.openai.com">platform.openai.com</a></li>
<li>Go to API Keys section</li>
<li>Create a new secret key</li>
<li>Add billing information (API access requires payment)</li>
</ol>
</section>
<section id="set-your-api-key" class="level3">
<h3 class="anchored" data-anchor-id="set-your-api-key">Set your API key</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Option 1: Set for current session</span></span>
<span id="cb2-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.setenv</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">OPENAI_API_KEY =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"sk-your-key-here"</span>)</span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Option 2: Add to .Renviron (recommended)</span></span>
<span id="cb2-5">usethis<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">edit_r_environ</span>()</span>
<span id="cb2-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add: OPENAI_API_KEY=sk-your-key-here</span></span></code></pre></div></div>
</section>
</section>
<section id="create-chat-completions" class="level2">
<h2 class="anchored" data-anchor-id="create-chat-completions">Create Chat Completions</h2>
<p>The main function for interacting with ChatGPT:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(openai)</span>
<span id="cb3-2"></span>
<span id="cb3-3">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb3-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,</span>
<span id="cb3-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb3-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What is R programming?"</span>)</span>
<span id="cb3-7">  )</span>
<span id="cb3-8">)</span>
<span id="cb3-9"></span>
<span id="cb3-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract the response text</span></span>
<span id="cb3-11">response<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>choices[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>content</span></code></pre></div></div>
<section id="with-system-prompt" class="level3">
<h3 class="anchored" data-anchor-id="with-system-prompt">With system prompt</h3>
<p>Control the assistant’s behavior:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb4-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,</span>
<span id="cb4-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb4-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"system"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are a helpful R programming tutor."</span>),</span>
<span id="cb4-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Explain what a data frame is."</span>)</span>
<span id="cb4-6">  )</span>
<span id="cb4-7">)</span></code></pre></div></div>
</section>
<section id="multi-turn-conversation" class="level3">
<h3 class="anchored" data-anchor-id="multi-turn-conversation">Multi-turn conversation</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">messages <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"system"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"You are an R expert."</span>),</span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"How do I read a CSV file?"</span>),</span>
<span id="cb5-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"assistant"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Use read.csv() or readr::read_csv()..."</span>),</span>
<span id="cb5-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What's the difference between them?"</span>)</span>
<span id="cb5-6">)</span>
<span id="cb5-7"></span>
<span id="cb5-8">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb5-9">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,</span>
<span id="cb5-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> messages</span>
<span id="cb5-11">)</span></code></pre></div></div>
</section>
</section>
<section id="available-models" class="level2">
<h2 class="anchored" data-anchor-id="available-models">Available Models</h2>
<section id="gpt-5-series-latest" class="level3">
<h3 class="anchored" data-anchor-id="gpt-5-series-latest">GPT-5 Series (Latest)</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Best For</th>
<th>Cost</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>gpt-5.4</code></td>
<td>Complex reasoning, coding</td>
<td>Higher</td>
</tr>
<tr class="even">
<td><code>gpt-5.4-mini</code></td>
<td>Fast, everyday tasks</td>
<td>Medium</td>
</tr>
<tr class="odd">
<td><code>gpt-5.4-nano</code></td>
<td>Lowest latency, budget</td>
<td>Lower</td>
</tr>
</tbody>
</table>
</section>
<section id="o-series-reasoning-models" class="level3">
<h3 class="anchored" data-anchor-id="o-series-reasoning-models">O-Series (Reasoning Models)</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Best For</th>
<th>Cost</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>o3</code></td>
<td>Most powerful reasoning</td>
<td>Highest</td>
</tr>
<tr class="even">
<td><code>o3-mini</code></td>
<td>Balanced reasoning</td>
<td>Medium</td>
</tr>
<tr class="odd">
<td><code>o4-mini</code></td>
<td>Affordable reasoning</td>
<td>Lower</td>
</tr>
</tbody>
</table>
</section>
<section id="legacy-models-still-available-in-api" class="level3">
<h3 class="anchored" data-anchor-id="legacy-models-still-available-in-api">Legacy Models (still available in API)</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>gpt-4o</code></td>
<td>Retired from ChatGPT Feb 2026, API still available</td>
</tr>
<tr class="even">
<td><code>gpt-4-turbo</code></td>
<td>128k context</td>
</tr>
</tbody>
</table>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># List available models</span></span>
<span id="cb6-2">models <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list_models</span>()</span></code></pre></div></div>
</section>
</section>
<section id="control-response-parameters" class="level2">
<h2 class="anchored" data-anchor-id="control-response-parameters">Control Response Parameters</h2>
<section id="temperature-creativity" class="level3">
<h3 class="anchored" data-anchor-id="temperature-creativity">Temperature (creativity)</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># More deterministic (good for code)</span></span>
<span id="cb7-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb7-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,</span>
<span id="cb7-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Write R code to calculate mean"</span>)),</span>
<span id="cb7-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">temperature =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span></span>
<span id="cb7-6">)</span>
<span id="cb7-7"></span>
<span id="cb7-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># More creative (good for brainstorming)</span></span>
<span id="cb7-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb7-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,</span>
<span id="cb7-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Suggest project ideas"</span>)),</span>
<span id="cb7-12">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">temperature =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span></span>
<span id="cb7-13">)</span></code></pre></div></div>
</section>
<section id="limit-response-length" class="level3">
<h3 class="anchored" data-anchor-id="limit-response-length">Limit response length</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb8-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,</span>
<span id="cb8-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Explain ggplot2"</span>)),</span>
<span id="cb8-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">max_tokens =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">150</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Limit response length</span></span>
<span id="cb8-5">)</span></code></pre></div></div>
</section>
</section>
<section id="practical-examples" class="level2">
<h2 class="anchored" data-anchor-id="practical-examples">Practical Examples</h2>
<section id="generate-r-code" class="level3">
<h3 class="anchored" data-anchor-id="generate-r-code">Generate R code</h3>
<p>Use a system prompt to get code-only responses:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">prompt <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Write R code to:</span></span>
<span id="cb9-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">1. Load the mtcars dataset</span></span>
<span id="cb9-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">2. Calculate average mpg by cylinder count</span></span>
<span id="cb9-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">3. Create a bar chart of the results"</span></span></code></pre></div></div>
<p>Set low temperature for deterministic code:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb10-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4"</span>,</span>
<span id="cb10-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb10-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"system"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Return only R code, no explanations."</span>),</span>
<span id="cb10-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> prompt)</span>
<span id="cb10-6">  ),</span>
<span id="cb10-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">temperature =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span></span>
<span id="cb10-8">)</span>
<span id="cb10-9"></span>
<span id="cb10-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(response<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>choices[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>content)</span></code></pre></div></div>
</section>
<section id="analyze-text-data" class="level3">
<h3 class="anchored" data-anchor-id="analyze-text-data">Analyze text data</h3>
<p>Create a reusable sentiment classifier. For more robust extraction, see <a href="../llm/how-to-extract-data-with-llms-in-r">How to Extract Structured Data with LLMs</a>.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">analyze_sentiment <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(text) {</span>
<span id="cb11-2">  response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb11-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,</span>
<span id="cb11-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb11-5">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"system"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Classify as positive/negative/neutral. One word only."</span>),</span>
<span id="cb11-6">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> text)</span>
<span id="cb11-7">    ),</span>
<span id="cb11-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">temperature =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb11-9">  )</span>
<span id="cb11-10">  response<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>choices[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>content</span>
<span id="cb11-11">}</span></code></pre></div></div>
<p>Apply to multiple reviews:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">reviews <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Great product!"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Terrible quality"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"It's okay"</span>)</span>
<span id="cb12-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sapply</span>(reviews, analyze_sentiment)</span>
<span id="cb12-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "positive", "negative", "neutral"</span></span></code></pre></div></div>
</section>
<section id="summarize-text" class="level3">
<h3 class="anchored" data-anchor-id="summarize-text">Summarize text</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">long_text <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Your long document here..."</span></span>
<span id="cb13-2"></span>
<span id="cb13-3">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb13-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,</span>
<span id="cb13-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb13-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"system"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Summarize the following text in 2-3 sentences."</span>),</span>
<span id="cb13-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> long_text)</span>
<span id="cb13-8">  )</span>
<span id="cb13-9">)</span></code></pre></div></div>
</section>
</section>
<section id="using-httr2-directly" class="level2">
<h2 class="anchored" data-anchor-id="using-httr2-directly">Using httr2 Directly</h2>
<p>For more control, call the API directly with httr2.</p>
<section id="build-the-request" class="level3">
<h3 class="anchored" data-anchor-id="build-the-request">Build the request</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(httr2)</span>
<span id="cb14-2"></span>
<span id="cb14-3">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">request</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://api.openai.com/v1/chat/completions"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_headers</span>(</span>
<span id="cb14-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Authorization =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bearer"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.getenv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"OPENAI_API_KEY"</span>)),</span>
<span id="cb14-6">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Content-Type</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">`</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"application/json"</span></span>
<span id="cb14-7">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_body_json</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb14-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,</span>
<span id="cb14-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hello!"</span>))</span>
<span id="cb14-11">  )) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">req_perform</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">resp_body_json</span>()</span></code></pre></div></div>
</section>
<section id="extract-the-response" class="level3">
<h3 class="anchored" data-anchor-id="extract-the-response">Extract the response</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">response<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>choices[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>content</span></code></pre></div></div>
</section>
</section>
<section id="error-handling" class="level2">
<h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">safe_completion <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(prompt) {</span>
<span id="cb16-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tryCatch</span>({</span>
<span id="cb16-3">    response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb16-4">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,</span>
<span id="cb16-5">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> prompt))</span>
<span id="cb16-6">    )</span>
<span id="cb16-7">    response<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>choices[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>content</span>
<span id="cb16-8">  }, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">error =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(e) {</span>
<span id="cb16-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">message</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"API Error: "</span>, e<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message)</span>
<span id="cb16-10">    <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span></span>
<span id="cb16-11">  })</span>
<span id="cb16-12">}</span>
<span id="cb16-13"></span>
<span id="cb16-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use safely</span></span>
<span id="cb16-15">result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_completion</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What is 2+2?"</span>)</span></code></pre></div></div>
<section id="common-errors" class="level3">
<h3 class="anchored" data-anchor-id="common-errors">Common errors</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 29%">
<col style="width: 29%">
<col style="width: 41%">
</colgroup>
<thead>
<tr class="header">
<th>Error</th>
<th>Cause</th>
<th>Solution</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>401 Unauthorized</td>
<td>Invalid API key</td>
<td>Check OPENAI_API_KEY</td>
</tr>
<tr class="even">
<td>429 Rate limit</td>
<td>Too many requests</td>
<td>Add delays between calls</td>
</tr>
<tr class="odd">
<td>400 Bad request</td>
<td>Invalid parameters</td>
<td>Check model name, messages format</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="cost-management" class="level2">
<h2 class="anchored" data-anchor-id="cost-management">Cost Management</h2>
<section id="understand-tokens" class="level3">
<h3 class="anchored" data-anchor-id="understand-tokens">Understand tokens</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Rough estimate: 1 token ≈ 4 characters in English</span></span>
<span id="cb17-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># "Hello, how are you?" ≈ 6 tokens</span></span>
<span id="cb17-3"></span>
<span id="cb17-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check usage in response</span></span>
<span id="cb17-5">response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb17-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,</span>
<span id="cb17-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hi"</span>))</span>
<span id="cb17-8">)</span>
<span id="cb17-9"></span>
<span id="cb17-10">response<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>usage</span>
<span id="cb17-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $prompt_tokens: 9</span></span>
<span id="cb17-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $completion_tokens: 10</span></span>
<span id="cb17-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># $total_tokens: 19</span></span></code></pre></div></div>
</section>
<section id="limit-costs" class="level3">
<h3 class="anchored" data-anchor-id="limit-costs">Limit costs</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use cheaper models for simple tasks</span></span>
<span id="cb18-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb18-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Cheaper than gpt-4o</span></span>
<span id="cb18-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> prompt)),</span>
<span id="cb18-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">max_tokens =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Limit response length</span></span>
<span id="cb18-6">)</span></code></pre></div></div>
</section>
</section>
<section id="batch-processing" class="level2">
<h2 class="anchored" data-anchor-id="batch-processing">Batch Processing</h2>
<p>Process multiple items efficiently:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(purrr)</span>
<span id="cb19-2"></span>
<span id="cb19-3">texts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Text 1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Text 2"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Text 3"</span>)</span>
<span id="cb19-4"></span>
<span id="cb19-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add delay to avoid rate limits</span></span>
<span id="cb19-6">results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(texts, \(text) {</span>
<span id="cb19-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># 500ms delay</span></span>
<span id="cb19-8"></span>
<span id="cb19-9">  response <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(</span>
<span id="cb19-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>,</span>
<span id="cb19-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">messages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb19-12">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">role =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Summarize:"</span>, text))</span>
<span id="cb19-13">    )</span>
<span id="cb19-14">  )</span>
<span id="cb19-15">  response<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>choices[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>content</span>
<span id="cb19-16">})</span></code></pre></div></div>
</section>
<section id="alternative-ellmer-package" class="level2">
<h2 class="anchored" data-anchor-id="alternative-ellmer-package">Alternative: ellmer Package</h2>
<p>The tidyverse-friendly alternative:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">install.packages</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ellmer"</span>)</span>
<span id="cb20-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ellmer)</span>
<span id="cb20-3"></span>
<span id="cb20-4">chat <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat_openai</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">model =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gpt-5.4-mini"</span>)</span>
<span id="cb20-5">chat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chat</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"What is R programming?"</span>)</span></code></pre></div></div>
<p>See our <a href="../llm/how-to-use-ellmer-in-r">ellmer tutorial</a> for more details.</p>
</section>
<section id="common-mistakes" class="level2">
<h2 class="anchored" data-anchor-id="common-mistakes">Common Mistakes</h2>
<p><strong>1. Forgetting to set API key</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Check if key is set</span></span>
<span id="cb21-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.getenv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"OPENAI_API_KEY"</span>)</span>
<span id="cb21-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Should not be empty</span></span></code></pre></div></div>
<p><strong>2. Not handling rate limits</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Add delays in loops</span></span>
<span id="cb22-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> (item <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> items) {</span>
<span id="cb22-3">  result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">create_chat_completion</span>(...)</span>
<span id="cb22-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.sleep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Wait 1 second</span></span>
<span id="cb22-5">}</span></code></pre></div></div>
<p><strong>3. Using expensive models for simple tasks</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># gpt-5.4-mini is often sufficient and much cheaper</span></span>
<span id="cb23-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Only use gpt-5.4 or o3 for complex reasoning tasks</span></span></code></pre></div></div>
<p><strong>4. Not checking response structure</strong></p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Always verify the response has expected content</span></span>
<span id="cb24-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.null</span>(response<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>choices[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>content)) {</span>
<span id="cb24-3">  result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> response<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>choices[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>message<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>content</span>
<span id="cb24-4">}</span></code></pre></div></div>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Task</th>
<th>Code</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Basic completion</td>
<td><code>create_chat_completion(model, messages)</code></td>
</tr>
<tr class="even">
<td>Set API key</td>
<td><code>Sys.setenv(OPENAI_API_KEY = "key")</code></td>
</tr>
<tr class="odd">
<td>Control creativity</td>
<td><code>temperature = 0.2</code> (low) to <code>0.9</code> (high)</td>
</tr>
<tr class="even">
<td>Limit response</td>
<td><code>max_tokens = 100</code></td>
</tr>
<tr class="odd">
<td>Check usage</td>
<td><code>response$usage$total_tokens</code></td>
</tr>
</tbody>
</table>
<ul>
<li>Start with <code>gpt-5.4-mini</code> for cost efficiency</li>
<li>Use <code>o3</code> or <code>o3-mini</code> for complex reasoning tasks</li>
<li>Use low temperature for deterministic outputs</li>
<li>Add delays when processing multiple items</li>
<li>Always handle errors gracefully</li>
</ul>
</section>
<section id="related-posts" class="level2">
<h2 class="anchored" data-anchor-id="related-posts">Related Posts</h2>
<ul>
<li><a href="../llm/how-to-use-claude-api-in-r">How to Use Claude API in R</a></li>
<li><a href="../llm/how-to-use-gemini-api-in-r">How to Use Gemini API in R</a></li>
<li><a href="../llm/how-to-use-ellmer-in-r">How to Use ellmer in R</a></li>
<li><a href="../llm/how-to-run-local-llms-in-r">How to Run Local LLMs in R</a></li>
<li><a href="../llm/how-to-extract-data-with-llms-in-r">How to Extract Data with LLMs in R</a></li>
</ul>
</section>
<section id="sources" class="level2">
<h2 class="anchored" data-anchor-id="sources">Sources</h2>
<ul>
<li><a href="https://tilburg.ai/2024/03/tutorial-openai-api-in-r/">Tilburg.ai OpenAI API Tutorial</a></li>
<li><a href="https://irudnyts.github.io/openai/">openai R Package Documentation</a></li>
<li><a href="https://www.listendata.com/2023/05/chatgpt-in-r.html">ListenData ChatGPT in R Guide</a></li>
</ul>


<!-- -->

</section>

 ]]></description>
  <category>llm</category>
  <category>openai</category>
  <guid>https://rstats101.com/llm/how-to-use-openai-api-in-r.html</guid>
  <pubDate>Fri, 03 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/llm/how-to-use-openai-api-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to create all combinations with expand_grid() in R</title>
  <link>https://rstats101.com/tidyr/how-to-create-all-combinations-with-expandgrid-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Creating combinations of data is a common task in data analysis, especially when you need to explore all possible pairings between variables or expand datasets for modeling. R provides powerful functions like <code>expand_grid()</code> and <code>expand.grid()</code> that generate all combinations of input values, making it easy to create comprehensive datasets for analysis and visualization.</p>
</section>
<section id="loading-required-packages" class="level2">
<h2 class="anchored" data-anchor-id="loading-required-packages">Loading Required Packages</h2>
<p>First, let’s load the tidyverse package which contains the <code>expand_grid()</code> function we’ll be using throughout this tutorial.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<p>The tidyverse provides modern data manipulation tools that work seamlessly together for data analysis tasks.</p>
</section>
<section id="basic-combinations-with-expand_grid" class="level2">
<h2 class="anchored" data-anchor-id="basic-combinations-with-expand_grid">Basic Combinations with expand_grid()</h2>
<p>Let’s start by creating simple vectors and then generating all possible combinations between them.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">var1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> letters[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>]</span>
<span id="cb2-2">var1</span></code></pre></div></div>
<p>This creates a vector with the first 5 lowercase letters. The <code>letters</code> constant in R contains all 26 lowercase letters of the alphabet.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">var2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> LETTERS[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>]</span>
<span id="cb3-2">var2</span></code></pre></div></div>
<p>Similarly, this creates a vector with the first 5 uppercase letters using R’s built-in <code>LETTERS</code> constant.</p>
<p>Now let’s create all possible combinations of these two vectors:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">combination_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expand_grid</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">var1 =</span> letters[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>],</span>
<span id="cb4-2">                             <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">var2 =</span> LETTERS[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>])</span>
<span id="cb4-3">combination_df</span></code></pre></div></div>
<p>The <code>expand_grid()</code> function creates a tibble with 25 rows (5 × 5), showing every possible pairing between the lowercase and uppercase letters.</p>
</section>
<section id="comparing-expand_grid-vs-expand.grid" class="level2">
<h2 class="anchored" data-anchor-id="comparing-expand_grid-vs-expand.grid">Comparing expand_grid() vs expand.grid()</h2>
<p>R also has a base function called <code>expand.grid()</code> that performs similar operations but with different output formatting.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">combination_df_base <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expand.grid</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">var1 =</span> letters[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>],</span>
<span id="cb5-2">                                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">var2 =</span> LETTERS[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>])</span>
<span id="cb5-3">combination_df_base</span></code></pre></div></div>
<p>While <code>expand.grid()</code> produces similar results, <code>expand_grid()</code> returns a tibble and generally integrates better with tidyverse workflows.</p>
</section>
<section id="working-with-data-frames" class="level2">
<h2 class="anchored" data-anchor-id="working-with-data-frames">Working with Data Frames</h2>
<p>You can also use <code>expand_grid()</code> to expand existing data frames with additional variables. Let’s create a simple data frame first:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">year =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2021</span>, </span>
<span id="cb6-2">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">quarter =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Q"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb6-3">df</span></code></pre></div></div>
<p>This creates a tibble with one year and four quarters. The <code>paste0()</code> function concatenates “Q” with numbers 1-4.</p>
<p>Now let’s expand this data frame to include multiple companies:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">expanded_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">expand_grid</span>(df, </span>
<span id="cb7-2">                          <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">companies =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"GOOG"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"MSFT"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"NVDA"</span>))</span>
<span id="cb7-3">expanded_df</span></code></pre></div></div>
<p>This operation creates 12 rows (4 quarters × 3 companies), duplicating the year and quarter information for each company. This is particularly useful when preparing data for financial analysis or creating templates for data collection.</p>
</section>
<section id="practical-applications" class="level2">
<h2 class="anchored" data-anchor-id="practical-applications">Practical Applications</h2>
<p>The combination functions are especially valuable for: - Creating factorial experimental designs - Generating parameter grids for model tuning - Building comprehensive datasets for simulation studies - Preparing data templates for reporting across multiple categories</p>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>The <code>expand_grid()</code> function provides an efficient way to create all possible combinations of variables in R. Whether you’re working with simple vectors or complex data frames, this function helps you generate comprehensive datasets needed for analysis, modeling, and reporting. Remember to use <code>expand_grid()</code> for tidyverse workflows and <code>expand.grid()</code> when working primarily with base R functions.</p>


<!-- -->

</section>

 ]]></description>
  <category>tidyr</category>
  <category>tidyr expand_grid()</category>
  <guid>https://rstats101.com/tidyr/how-to-create-all-combinations-with-expandgrid-in-r.html</guid>
  <pubDate>Thu, 02 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/tidyr/expand-grid-in-r-hero-ggplot.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>Effect of centering and scaling data in R</title>
  <link>https://rstats101.com/statistics/effect-of-centering-and-scaling-data-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Data scaling and centering are essential preprocessing steps in data analysis and machine learning. These transformations help normalize variables with different units and scales, making them comparable and improving the performance of many algorithms. We’ll explore how to visualize and apply these transformations using R and the Palmer penguins dataset.</p>
</section>
<section id="loading-required-libraries" class="level2">
<h2 class="anchored" data-anchor-id="loading-required-libraries">Loading Required Libraries</h2>
<p>Let’s start by loading the necessary packages for our analysis.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(palmerpenguins)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggridges)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_set</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>))</span></code></pre></div></div>
</section>
<section id="preparing-the-data" class="level2">
<h2 class="anchored" data-anchor-id="preparing-the-data">Preparing the Data</h2>
<p>First, we’ll prepare our dataset by removing missing values and selecting only the numeric variables.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop_na</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.numeric))</span></code></pre></div></div>
<p>Let’s examine the first few rows to understand our data structure.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>()</span></code></pre></div></div>
<p>Our dataset now contains four numeric variables: bill length, bill depth, flipper length, and body mass, all measured on different scales.</p>
</section>
<section id="visualizing-raw-data-distributions" class="level2">
<h2 class="anchored" data-anchor-id="visualizing-raw-data-distributions">Visualizing Raw Data Distributions</h2>
<p>Before applying any transformations, let’s visualize how our variables are distributed using boxplots.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">row_id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">row_number</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>row_id, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"feature"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"value"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> feature, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> feature)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_boxplot</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">outlier.shape =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_jitter</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Raw Penguin Measurements on Different Scales"</span>,</span>
<span id="cb4-8">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Feature"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Value"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>)</span></code></pre></div></div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://rstats101.com/images/statistics/centering-scaling-in-r-raw-boxplot-ggplot.png" class="img-fluid figure-img"></p>
<figcaption>Boxplot with jittered points of raw penguin bill length, bill depth, flipper length, and body mass in R showing how the body mass variable dominates the shared y-axis because its numeric range is orders of magnitude larger than the other features before centering and scaling.</figcaption>
</figure>
</div>
<p>Notice how the variables have very different ranges - body mass is in thousands while bill measurements are in tens. This makes direct comparison difficult.</p>
</section>
<section id="using-ridge-plots-for-better-comparison" class="level2">
<h2 class="anchored" data-anchor-id="using-ridge-plots-for-better-comparison">Using Ridge Plots for Better Comparison</h2>
<p>Ridge plots provide a clearer view of each variable’s distribution shape.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">row_id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">row_number</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>row_id, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"feature"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"value"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> feature, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> feature)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_density_ridges2</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>)</span></code></pre></div></div>
<p>The different scales make it challenging to compare distribution shapes across variables.</p>
</section>
<section id="scaling-and-centering-data" class="level2">
<h2 class="anchored" data-anchor-id="scaling-and-centering-data">Scaling and Centering Data</h2>
<p>Now let’s apply both centering (subtracting the mean) and scaling (dividing by standard deviation) to standardize our variables.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">df_scaled <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">center =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scale =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
<p>Let’s examine the transformed data.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">df_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>()</span></code></pre></div></div>
<p>The scaled data now has a mean of 0 and standard deviation of 1 for each variable, making them directly comparable.</p>
</section>
<section id="visualizing-scaled-data" class="level2">
<h2 class="anchored" data-anchor-id="visualizing-scaled-data">Visualizing Scaled Data</h2>
<p>Let’s see how the scaling transformation affects our distributions.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">df_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.data.frame</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">row_id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">row_number</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>row_id, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"feature"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"value"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> feature, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> feature)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_density_ridges2</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"After Centering and Scaling (mean 0, sd 1)"</span>,</span>
<span id="cb8-8">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Standardized value"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Feature"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>)</span></code></pre></div></div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://rstats101.com/images/statistics/centering-scaling-in-r-scaled-ridge-ggplot.png" class="img-fluid figure-img"></p>
<figcaption>Ridge density plot of penguin bill length, bill depth, flipper length, and body mass after centering and scaling in R with scale(center = TRUE, scale = TRUE), showing all four distributions aligned on a common z-score axis with mean zero and unit standard deviation.</figcaption>
</figure>
</div>
<p>Now all variables are on the same scale, making it easy to compare their distribution shapes and identify which variables have the most variability.</p>
</section>
<section id="centering-only-without-scaling" class="level2">
<h2 class="anchored" data-anchor-id="centering-only-without-scaling">Centering Only (Without Scaling)</h2>
<p>Sometimes you might want to center data without scaling. This shifts distributions to have a mean of 0 but preserves the original variance.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">df_centered <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">center =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scale =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span></code></pre></div></div>
<p>Let’s examine the centered-only data.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">df_centered <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>()</span></code></pre></div></div>
</section>
<section id="visualizing-centered-data" class="level2">
<h2 class="anchored" data-anchor-id="visualizing-centered-data">Visualizing Centered Data</h2>
<p>Here’s how centering without scaling affects our distributions.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">df_centered <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.data.frame</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">row_id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">row_number</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>row_id, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"feature"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"value"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> feature, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> feature)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_density_ridges2</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"After Centering Only (mean 0, original spreads)"</span>,</span>
<span id="cb11-8">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Centered value"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Feature"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb11-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>)</span></code></pre></div></div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://rstats101.com/images/statistics/centering-scaling-in-r-centered-only-ridge-ggplot.png" class="img-fluid figure-img"></p>
<figcaption>Ridge density plot of centered-only penguin measurements in R using scale(center = TRUE, scale = FALSE), showing body mass retaining its much wider variance compared to the bill and flipper measurements while every distribution is now centered around zero.</figcaption>
</figure>
</div>
<p>The distributions maintain their original spreads but are now centered around zero, which can be useful for certain analyses while preserving the relative scale differences.</p>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>Data scaling and centering are powerful preprocessing techniques that make variables comparable and improve analysis quality. Use full scaling (center = TRUE, scale = TRUE) when you want all variables on the same scale, such as for machine learning algorithms. Use centering only (center = TRUE, scale = FALSE) when you want to preserve relative scale differences but center distributions around zero. Visual exploration with ridge plots helps you understand the impact of these transformations on your data.</p>


<!-- -->

</section>

 ]]></description>
  <category>statistics</category>
  <category>data preprocessing</category>
  <guid>https://rstats101.com/statistics/effect-of-centering-and-scaling-data-in-r.html</guid>
  <pubDate>Thu, 02 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/statistics/centering-scaling-in-r-raw-boxplot-ggplot.png" medium="image" type="image/png" height="99" width="144"/>
</item>
<item>
  <title>How to calculate covariance in R</title>
  <link>https://rstats101.com/statistics/how-to-calculate-covariance-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Covariance and correlation matrices are fundamental tools in data analysis that help us understand relationships between multiple variables. Covariance measures how variables change together, while correlation standardizes these relationships to a -1 to 1 scale. This tutorial demonstrates how to calculate and interpret these matrices using R’s built-in functions and matrix operations.</p>
</section>
<section id="loading-required-packages" class="level2">
<h2 class="anchored" data-anchor-id="loading-required-packages">Loading Required Packages</h2>
<p>First, let’s load the necessary packages for our analysis:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(palmerpenguins)</span></code></pre></div></div>
</section>
<section id="preparing-the-data" class="level2">
<h2 class="anchored" data-anchor-id="preparing-the-data">Preparing the Data</h2>
<p>We’ll use the Palmer penguins dataset, removing missing values and the year column to focus on the main measurements:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">penguins <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop_na</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>year)</span></code></pre></div></div>
<p>Let’s examine our cleaned dataset to see what variables we’re working with. This gives us a complete dataset with species, island, sex, and four numeric measurements for each penguin.</p>
</section>
<section id="basic-covariance-matrix" class="level2">
<h2 class="anchored" data-anchor-id="basic-covariance-matrix">Basic Covariance Matrix</h2>
<p>To calculate covariance between all numeric variables, we first select only the numeric columns:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">numeric_vars <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(is.numeric)</span></code></pre></div></div>
<p>Now we can compute the covariance matrix using the <code>cov()</code> function:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov</span>(numeric_vars)</span></code></pre></div></div>
<p>The covariance matrix shows how each pair of variables varies together. Larger absolute values indicate stronger relationships, but the scale depends on the units of measurement.</p>
</section>
<section id="standardizing-the-data" class="level2">
<h2 class="anchored" data-anchor-id="standardizing-the-data">Standardizing the Data</h2>
<p>To better compare relationships, let’s standardize our numeric variables to have mean 0 and standard deviation 1:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">scaled_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> numeric_vars <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale</span>()</span></code></pre></div></div>
<p>Scaling puts all variables on the same scale, making covariances more comparable across different measurements.</p>
</section>
<section id="covariance-of-scaled-data" class="level2">
<h2 class="anchored" data-anchor-id="covariance-of-scaled-data">Covariance of Scaled Data</h2>
<p>With scaled data, the covariance matrix becomes more interpretable:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov</span>(scaled_data)</span></code></pre></div></div>
<p>For standardized variables, covariance values range from -1 to 1, making them equivalent to correlation coefficients.</p>
</section>
<section id="correlation-matrix" class="level2">
<h2 class="anchored" data-anchor-id="correlation-matrix">Correlation Matrix</h2>
<p>We can also calculate correlations directly using the <code>cor()</code> function:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cor</span>(scaled_data)</span></code></pre></div></div>
<p>The correlation matrix is identical to the covariance matrix of scaled data. Values closer to 1 or -1 indicate stronger linear relationships.</p>
</section>
<section id="calculating-covariance-between-variable-groups" class="level2">
<h2 class="anchored" data-anchor-id="calculating-covariance-between-variable-groups">Calculating Covariance Between Variable Groups</h2>
<p>Sometimes we want to examine relationships between specific groups of variables. Here we compare bill measurements with body measurements:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">bill_vars <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> scaled_data[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># bill_length_mm, bill_depth_mm</span></span>
<span id="cb8-2">body_vars <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> scaled_data[, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>]  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># flipper_length_mm, body_mass_g</span></span>
<span id="cb8-3"></span>
<span id="cb8-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cor</span>(bill_vars, body_vars)</span></code></pre></div></div>
<p>This cross-correlation matrix shows how bill dimensions relate to body size measurements.</p>
</section>
<section id="manual-covariance-calculation" class="level2">
<h2 class="anchored" data-anchor-id="manual-covariance-calculation">Manual Covariance Calculation</h2>
<p>Understanding the mathematical foundation, we can calculate covariance manually using matrix multiplication:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(scaled_data)</span>
<span id="cb9-2">manual_cov <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(scaled_data) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> scaled_data) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div></div>
<p>This matrix multiplication approach gives us the same result as <code>cov()</code>. The formula represents the mathematical definition: the average of cross-products of deviations from means.</p>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>Covariance and correlation matrices provide powerful insights into variable relationships in multivariate data. While <code>cov()</code> and <code>cor()</code> functions handle the calculations efficiently, understanding the underlying matrix operations helps deepen your statistical intuition. Standardizing variables before analysis often makes results more interpretable, especially when variables have different units or scales.</p>


<!-- -->

</section>

 ]]></description>
  <category>statistics</category>
  <category>covariance</category>
  <guid>https://rstats101.com/statistics/how-to-calculate-covariance-in-r.html</guid>
  <pubDate>Thu, 02 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/statistics/calculate-covariance-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to calculate Pearson correlation in R</title>
  <link>https://rstats101.com/statistics/how-to-calculate-pearson-correlation-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Data scaling and centering are fundamental preprocessing steps in data analysis and machine learning. Scaling transforms variables to have similar ranges, while centering adjusts variables to have a mean of zero. These transformations are essential when working with variables of different units or magnitudes, ensuring that no single variable dominates analyses due to its scale.</p>
</section>
<section id="setup-and-data-preparation" class="level2">
<h2 class="anchored" data-anchor-id="setup-and-data-preparation">Setup and Data Preparation</h2>
<p>Let’s start by loading the necessary libraries and preparing our dataset.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(palmerpenguins)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_set</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_bw</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>))</span></code></pre></div></div>
<p>We’ll use the Palmer Penguins dataset, selecting only the numeric variables for our scaling demonstration.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop_na</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>year) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.numeric))</span></code></pre></div></div>
<p>Let’s examine our cleaned dataset structure:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>()</span></code></pre></div></div>
<p>This gives us four numeric variables: bill length, bill depth, flipper length, and body mass, all measured in different units and scales.</p>
</section>
<section id="understanding-variable-scales" class="level2">
<h2 class="anchored" data-anchor-id="understanding-variable-scales">Understanding Variable Scales</h2>
<p>Before scaling, let’s visualize how our variables differ in their ranges and distributions.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">row_id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">row_number</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>row_id, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"feature"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"value"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> feature, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> feature)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_boxplot</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">outlier.shape =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_jitter</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_log10</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>)</span></code></pre></div></div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://rstats101.com/images/statistics/calculate-pearson-correlation-in-r-raw-feature-boxplot-ggplot.png" class="img-fluid figure-img"></p>
<figcaption>Boxplot in R showing Palmer Penguins numeric features on a log scale before Pearson correlation calculation, highlighting the different scales of bill length, bill depth, flipper length, and body mass in ggplot2</figcaption>
</figure>
</div>
<p>Notice how body mass ranges in thousands while bill measurements are in tens - this dramatic difference in scales can problematic for many analyses.</p>
</section>
<section id="standardization-z-score-scaling" class="level2">
<h2 class="anchored" data-anchor-id="standardization-z-score-scaling">Standardization (Z-score Scaling)</h2>
<p>Standardization transforms variables to have a mean of 0 and standard deviation of 1. This is the default behavior of the <code>scale()</code> function.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">df_scaled <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale</span>()</span>
<span id="cb5-2">df_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>()</span></code></pre></div></div>
<p>Let’s visualize the scaled data to see how the distributions now compare:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">df_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.data.frame</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">row_id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">row_number</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>row_id, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"feature"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"value"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> feature, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> feature)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_boxplot</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">outlier.shape =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_jitter</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>)</span></code></pre></div></div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://rstats101.com/images/statistics/calculate-pearson-correlation-in-r-scaled-feature-boxplot-ggplot.png" class="img-fluid figure-img"></p>
<figcaption>Boxplot in R of standardized penguin features (mean 0, standard deviation 1) used to calculate Pearson correlation, showing comparable ranges after scale() transformation in ggplot2</figcaption>
</figure>
</div>
<p>Now all variables have similar scales, making them directly comparable. The standardized variables are centered around 0 with similar spreads.</p>
</section>
<section id="centering-only" class="level2">
<h2 class="anchored" data-anchor-id="centering-only">Centering Only</h2>
<p>Sometimes you may want to center variables (mean = 0) without scaling them. Use the <code>center</code> and <code>scale</code> parameters explicitly:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">df_centered <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">center =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scale =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb7-3">df_centered <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>()</span></code></pre></div></div>
<p>Centering alone shifts the distribution but preserves the original scale relationships between variables.</p>
</section>
<section id="verifying-transformations" class="level2">
<h2 class="anchored" data-anchor-id="verifying-transformations">Verifying Transformations</h2>
<p>Let’s confirm our scaling worked by checking the means of our standardized data:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colMeans</span>(df_scaled)</span></code></pre></div></div>
<p>All means should be essentially zero (within floating-point precision). Now let’s check the covariance matrix:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cov</span>(df_scaled)</span></code></pre></div></div>
<p>For standardized data, the covariance matrix equals the correlation matrix, since standardization makes the variance of each variable equal to 1.</p>
</section>
<section id="working-with-correlation-matrices" class="level2">
<h2 class="anchored" data-anchor-id="working-with-correlation-matrices">Working with Correlation Matrices</h2>
<p>The correlation matrix shows the linear relationships between variables:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cor</span>(df_scaled)</span></code></pre></div></div>
<p>You can also calculate correlations between subsets of variables:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cor</span>(df_scaled[,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>], df_scaled[,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>])</span></code></pre></div></div>
<p>This shows correlations between the first two variables (bill measurements) and the last two (flipper length and body mass).</p>
</section>
<section id="matrix-multiplication-approach" class="level2">
<h2 class="anchored" data-anchor-id="matrix-multiplication-approach">Matrix Multiplication Approach</h2>
<p>Understanding that correlation can be computed through matrix multiplication helps grasp the mathematical foundation:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(df_scaled)</span>
<span id="cb12-2">(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(df_scaled) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%*%</span> df_scaled) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div></div>
<p>This manual calculation produces the same result as the <code>cor()</code> function, demonstrating how correlation matrices are computed mathematically.</p>
<p>Let’s verify this matches the built-in function:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cor</span>(df_scaled)</span></code></pre></div></div>
<p>Both approaches yield identical results, confirming our understanding of the underlying mathematics.</p>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>Data scaling and centering are crucial preprocessing steps that ensure variables contribute equally to analyses. Standardization (z-score scaling) creates variables with mean 0 and standard deviation 1, while centering only adjusts the mean. The <code>scale()</code> function in R provides flexible options for these transformations, and understanding the relationship between scaled data and correlation matrices helps build intuition for multivariate analyses. Always visualize your data before and after scaling to ensure the transformations achieve your analytical goals.</p>


<!-- -->

</section>

 ]]></description>
  <category>statistics</category>
  <category>correlation</category>
  <guid>https://rstats101.com/statistics/how-to-calculate-pearson-correlation-in-r.html</guid>
  <pubDate>Thu, 02 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/statistics/calculate-pearson-correlation-in-r-raw-feature-boxplot-ggplot.png" medium="image" type="image/png" height="99" width="144"/>
</item>
<item>
  <title>How to format currency in gt tables in R</title>
  <link>https://rstats101.com/how-to/how-to-format-currency-in-gt-tables-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>The <code>gt</code> package provides powerful tools for creating beautiful, publication-ready tables in R. One of its most useful features is <code>fmt_currency()</code>, which automatically formats numeric values as currency with proper symbols, decimal places, and locale-specific formatting. This function is essential when creating financial reports, dashboards, or any data presentation involving monetary values.</p>
</section>
<section id="loading-required-libraries" class="level2">
<h2 class="anchored" data-anchor-id="loading-required-libraries">Loading Required Libraries</h2>
<p>First, let’s load the necessary packages for this tutorial:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(gt)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
</section>
<section id="creating-a-basic-data-table" class="level2">
<h2 class="anchored" data-anchor-id="creating-a-basic-data-table">Creating a Basic Data Table</h2>
<p>Let’s start with a simple dataset containing stock symbols and their prices:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">stock_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb2-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">symbol =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"GOOG"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"META"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"MSFT"</span>), </span>
<span id="cb2-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">price =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">168</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">465</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">425</span>)</span>
<span id="cb2-4">)</span>
<span id="cb2-5">stock_data</span></code></pre></div></div>
<p>This creates a basic tibble with three tech stocks and their current prices in dollars.</p>
</section>
<section id="converting-to-a-gt-table" class="level2">
<h2 class="anchored" data-anchor-id="converting-to-a-gt-table">Converting to a gt Table</h2>
<p>Now let’s convert our data frame into a <code>gt</code> table for better presentation:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">stock_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>()</span></code></pre></div></div>
<p>The <code>gt()</code> function transforms our tibble into a formatted HTML table, but the prices still appear as plain numbers without currency formatting.</p>
</section>
<section id="basic-currency-formatting" class="level2">
<h2 class="anchored" data-anchor-id="basic-currency-formatting">Basic Currency Formatting</h2>
<p>To format all numeric columns as currency, we can use <code>fmt_currency()</code> without specifying columns:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">stock_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fmt_currency</span>()</span></code></pre></div></div>
<p>This applies dollar formatting to all numeric columns, adding the $ symbol and two decimal places.</p>
</section>
<section id="targeting-specific-columns" class="level2">
<h2 class="anchored" data-anchor-id="targeting-specific-columns">Targeting Specific Columns</h2>
<p>For more control, specify which columns to format using the <code>columns</code> parameter:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">stock_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fmt_currency</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> price)</span></code></pre></div></div>
<p>This approach is safer when you have multiple numeric columns but only want to format certain ones as currency.</p>
</section>
<section id="working-with-multiple-currencies" class="level2">
<h2 class="anchored" data-anchor-id="working-with-multiple-currencies">Working with Multiple Currencies</h2>
<p>Let’s create a more complex example with multiple currency columns:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">stock_data_multi <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb6-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">symbol =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"GOOG"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"META"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"MSFT"</span>), </span>
<span id="cb6-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">price_USD =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">168</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">465</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">425</span>)</span>
<span id="cb6-4">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">price_INR =</span> price_USD <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">80</span>)</span>
<span id="cb6-6"></span>
<span id="cb6-7">stock_data_multi</span></code></pre></div></div>
<p>This creates a dataset with prices in both US dollars and Indian rupees.</p>
</section>
<section id="exploring-available-currency-options" class="level2">
<h2 class="anchored" data-anchor-id="exploring-available-currency-options">Exploring Available Currency Options</h2>
<p>Before formatting different currencies, let’s see what currency symbols are available:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">info_currencies</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"symbol"</span>)</span></code></pre></div></div>
<p>This function shows all supported currency symbols and their corresponding names in the <code>gt</code> package.</p>
</section>
<section id="formatting-multiple-currencies" class="level2">
<h2 class="anchored" data-anchor-id="formatting-multiple-currencies">Formatting Multiple Currencies</h2>
<p>Now let’s format each currency column appropriately:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">stock_data_multi <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fmt_currency</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> price_USD) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fmt_currency</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> price_INR, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">currency =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rupee"</span>)</span></code></pre></div></div>
<p>Each <code>fmt_currency()</code> call can target different columns and apply different currency symbols, making it easy to create multi-currency tables.</p>
</section>
<section id="dynamic-currency-formatting" class="level2">
<h2 class="anchored" data-anchor-id="dynamic-currency-formatting">Dynamic Currency Formatting</h2>
<p>For more advanced use cases, you can format currencies dynamically based on data in your table:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">multi_locale_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb9-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">amount =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">50.84</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>),</span>
<span id="cb9-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">currency =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"JPY"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"USD"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"GHS"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"KRW"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"CNY"</span>),</span>
<span id="cb9-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">locale =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ja"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"en"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ee"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ko"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"zh"</span>)</span>
<span id="cb9-5">)</span>
<span id="cb9-6"></span>
<span id="cb9-7">multi_locale_data</span></code></pre></div></div>
<p>This creates a dataset with the same amount in different currencies and their corresponding locales.</p>
</section>
<section id="using-locale-based-formatting" class="level2">
<h2 class="anchored" data-anchor-id="using-locale-based-formatting">Using Locale-Based Formatting</h2>
<p>Finally, let’s format the currencies using their appropriate locales and hide the helper columns:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">multi_locale_data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fmt_currency</span>(</span>
<span id="cb10-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> amount,</span>
<span id="cb10-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">locale =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">from_column</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">column =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"locale"</span>)</span>
<span id="cb10-6">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cols_hide</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">columns =</span> locale)</span></code></pre></div></div>
<p>The <code>from_column()</code> function tells <code>gt</code> to use values from the locale column to determine formatting, while <code>cols_hide()</code> removes the helper column from the final display.</p>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>The <code>fmt_currency()</code> function in <code>gt</code> provides flexible currency formatting options, from basic dollar formatting to complex multi-locale presentations. Key features include targeting specific columns, using different currency symbols, and dynamic formatting based on data values. These tools make it easy to create professional financial reports and data presentations with properly formatted monetary values.</p>


<!-- -->

</section>

 ]]></description>
  <category>how-to</category>
  <category>gt tables</category>
  <guid>https://rstats101.com/how-to/how-to-format-currency-in-gt-tables-in-r.html</guid>
  <pubDate>Thu, 02 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/how-to/format-currency-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to count unique values with n_distinct() in R</title>
  <link>https://rstats101.com/dplyr/how-to-count-unique-values-with-ndistinct-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>The <code>n_distinct()</code> function in dplyr is a powerful tool for counting the number of unique values in a vector or dataset. This function is essential for data exploration and quality checks, helping you quickly understand the diversity of your data. You’ll find it particularly useful when examining categorical variables, checking for duplicates, or summarizing data by groups.</p>
</section>
<section id="setup" class="level2">
<h2 class="anchored" data-anchor-id="setup">Setup</h2>
<p>Let’s start by loading the tidyverse package and creating some sample data to work with:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create sample data with some duplicate IDs and a missing value</span></span>
<span id="cb2-2">df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb2-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NA</span>),</span>
<span id="cb2-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">amount =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">150</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">120</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)</span>
<span id="cb2-5">)</span>
<span id="cb2-6"></span>
<span id="cb2-7">df</span></code></pre></div></div>
<p>Our dataset contains 7 rows but notice that some ID values are repeated, and we have one missing value (NA).</p>
</section>
<section id="basic-usage-of-n_distinct" class="level2">
<h2 class="anchored" data-anchor-id="basic-usage-of-n_distinct">Basic Usage of n_distinct()</h2>
<p>The most common way to count distinct values is to use <code>n_distinct()</code> with <code>pull()</code> to extract a column:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(id) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb3-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n_distinct</span>()</span></code></pre></div></div>
<p>This counts all unique values including NA, so we get 5 distinct values (1, 2, 3, 4, and NA).</p>
</section>
<section id="handling-missing-values" class="level2">
<h2 class="anchored" data-anchor-id="handling-missing-values">Handling Missing Values</h2>
<p>To exclude missing values from the count, use the <code>na.rm</code> parameter:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(id) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n_distinct</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
<p>Now we get 4 distinct values, excluding the NA.</p>
</section>
<section id="alternative-approaches" class="level2">
<h2 class="anchored" data-anchor-id="alternative-approaches">Alternative Approaches</h2>
<p>You can achieve the same result using <code>unique()</code> and <code>length()</code>:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(id) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>()</span></code></pre></div></div>
<p>This approach first gets unique values, then counts them. However, <code>n_distinct()</code> is more concise and handles missing values more explicitly.</p>
</section>
<section id="direct-column-access" class="level2">
<h2 class="anchored" data-anchor-id="direct-column-access">Direct Column Access</h2>
<p>You can also use <code>n_distinct()</code> directly on a column without pipes:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n_distinct</span>(df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>id)</span></code></pre></div></div>
<p>This base R syntax is shorter for simple cases but doesn’t integrate as well with dplyr workflows.</p>
</section>
<section id="counting-distinct-rows" class="level2">
<h2 class="anchored" data-anchor-id="counting-distinct-rows">Counting Distinct Rows</h2>
<p>When applied to an entire dataframe, <code>n_distinct()</code> counts unique combinations of all columns:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n_distinct</span>()</span></code></pre></div></div>
<p>This tells us how many completely unique rows exist in our dataset.</p>
</section>
<section id="using-n_distinct-with-group-operations" class="level2">
<h2 class="anchored" data-anchor-id="using-n_distinct-with-group-operations">Using n_distinct() with Group Operations</h2>
<p>One of the most powerful applications is counting distinct values within groups:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Example with grouped data</span></span>
<span id="cb8-2">df_grouped <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb8-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">category =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"B"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"B"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"B"</span>),</span>
<span id="cb8-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)</span>
<span id="cb8-5">)</span>
<span id="cb8-6"></span>
<span id="cb8-7">df_grouped <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(category) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">distinct_values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n_distinct</span>(value))</span></code></pre></div></div>
<p>This shows how many distinct values exist within each category, which is invaluable for understanding data distribution across groups.</p>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>The <code>n_distinct()</code> function is an essential tool for exploratory data analysis in R. Use it to quickly count unique values in columns, check data quality, and summarize categorical variables. Remember to use <code>na.rm = TRUE</code> when you want to exclude missing values from your counts. Combined with <code>group_by()</code>, it becomes even more powerful for understanding patterns within different subsets of your data.</p>


<!-- -->

</section>

 ]]></description>
  <category>dplyr</category>
  <category>dplyr n_distinct()</category>
  <guid>https://rstats101.com/dplyr/how-to-count-unique-values-with-ndistinct-in-r.html</guid>
  <pubDate>Wed, 25 Mar 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/dplyr/n-distinct-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to get the first row of each group in R</title>
  <link>https://rstats101.com/dplyr/how-to-get-the-first-row-of-each-group-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Calculating proportions and frequencies is a fundamental task in data analysis that helps us understand the relative distribution of categorical variables. This tutorial demonstrates how to use <code>dplyr</code> functions like <code>count()</code> and <code>mutate()</code> to calculate proportions from grouped data. These techniques are essential when you need to convert raw counts into percentages or when comparing the relative sizes of different groups in your dataset.</p>
</section>
<section id="setup" class="level2">
<h2 class="anchored" data-anchor-id="setup">Setup</h2>
<p>First, let’s load the required packages and prepare our data for analysis.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(palmerpenguins)</span></code></pre></div></div>
<p>We’ll use the Palmer penguins dataset, but first we need to remove any missing values to ensure accurate calculations.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">penguins_clean <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop_na</span>()</span></code></pre></div></div>
<p>This gives us a clean dataset with complete observations for all variables.</p>
</section>
<section id="basic-counting" class="level2">
<h2 class="anchored" data-anchor-id="basic-counting">Basic Counting</h2>
<p>Let’s start by counting the number of penguins by species to understand our data structure.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">penguins_clean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(species)</span></code></pre></div></div>
<p>The <code>count()</code> function returns the frequency of each species in our dataset. This shows us the absolute numbers, but often we want to know the relative proportions.</p>
</section>
<section id="calculating-simple-proportions" class="level2">
<h2 class="anchored" data-anchor-id="calculating-simple-proportions">Calculating Simple Proportions</h2>
<p>To convert counts into proportions, we can use <code>mutate()</code> to create a new column that divides each count by the total.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">penguins_clean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prop =</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(n))</span></code></pre></div></div>
<p>This creates a proportion column where all values sum to 1.0, showing us what fraction of the total each species represents.</p>
</section>
<section id="alternative-approach-with-prop.table" class="level2">
<h2 class="anchored" data-anchor-id="alternative-approach-with-prop.table">Alternative Approach with prop.table()</h2>
<p>R provides the <code>prop.table()</code> function as an alternative way to calculate proportions from counts.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">penguins_clean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">freq =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prop.table</span>(n))</span></code></pre></div></div>
<p>Both approaches yield the same results, but <code>prop.table()</code> can be more explicit about your intention to calculate proportions.</p>
</section>
<section id="proportions-with-multiple-groups" class="level2">
<h2 class="anchored" data-anchor-id="proportions-with-multiple-groups">Proportions with Multiple Groups</h2>
<p>When working with multiple categorical variables, calculating proportions becomes more complex. Let’s count penguins by both species and sex.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">penguins_clean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(species, sex)</span></code></pre></div></div>
<p>This gives us the count for each combination of species and sex.</p>
</section>
<section id="overall-proportions-across-groups" class="level2">
<h2 class="anchored" data-anchor-id="overall-proportions-across-groups">Overall Proportions Across Groups</h2>
<p>To calculate what proportion each species-sex combination represents of the total dataset:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">penguins_clean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(species, sex) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prop =</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(n))</span></code></pre></div></div>
<p>Each proportion represents that group’s share of the entire dataset, and all proportions will sum to 1.0.</p>
</section>
<section id="proportions-within-groups" class="level2">
<h2 class="anchored" data-anchor-id="proportions-within-groups">Proportions Within Groups</h2>
<p>Often you’ll want to calculate proportions within each species rather than across the entire dataset.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">penguins_clean <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(species, sex) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prop_within_species =</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(n), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by =</span> species)</span></code></pre></div></div>
<p>The <code>.by</code> argument tells <code>mutate()</code> to calculate proportions separately for each species. Now the proportions within each species sum to 1.0, showing the sex distribution within each species.</p>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>This tutorial covered several approaches for calculating proportions from categorical data using <code>dplyr</code>. The key techniques include using <code>count()</code> to get frequencies, <code>mutate()</code> with division to calculate simple proportions, and the <code>.by</code> argument to calculate proportions within specific groups. These methods are essential for exploratory data analysis and creating meaningful summaries of categorical variables in your datasets.</p>


<!-- -->

</section>

 ]]></description>
  <category>dplyr</category>
  <category>dplyr first</category>
  <guid>https://rstats101.com/dplyr/how-to-get-the-first-row-of-each-group-in-r.html</guid>
  <pubDate>Wed, 25 Mar 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/dplyr/first-row-of-each-group-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to get the last row of each group in R</title>
  <link>https://rstats101.com/dplyr/how-to-get-the-last-row-of-each-group-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>The dplyr package provides powerful functions for extracting specific rows and performing grouped operations on data frames. These functions help you access the first, last, or nth observations in your data, calculate proportions within groups, and handle consecutive runs of data. These techniques are essential for data exploration, summary statistics, and data cleaning tasks.</p>
</section>
<section id="setup" class="level2">
<h2 class="anchored" data-anchor-id="setup">Setup</h2>
<p>Let’s start by loading the necessary packages and preparing our data:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(palmerpenguins)</span></code></pre></div></div>
<p>We’ll clean the penguins dataset by removing any rows with missing values:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">penguins <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop_na</span>()</span></code></pre></div></div>
<p>This gives us a complete dataset with 333 penguin observations to work with.</p>
</section>
<section id="extracting-specific-rows" class="level2">
<h2 class="anchored" data-anchor-id="extracting-specific-rows">Extracting Specific Rows</h2>
<section id="getting-the-last-row" class="level3">
<h3 class="anchored" data-anchor-id="getting-the-last-row">Getting the Last Row</h3>
<p>The <code>last()</code> function extracts the final row from your dataset:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">last</span>()</span></code></pre></div></div>
<p>This returns the complete last observation, showing all variables for the final penguin in the dataset.</p>
</section>
<section id="getting-the-first-row" class="level3">
<h3 class="anchored" data-anchor-id="getting-the-first-row">Getting the First Row</h3>
<p>Similarly, <code>first()</code> extracts the initial row:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">first</span>()</span></code></pre></div></div>
<p>You’ll see the first penguin observation with all its measurements and characteristics.</p>
</section>
<section id="getting-the-nth-row" class="level3">
<h3 class="anchored" data-anchor-id="getting-the-nth-row">Getting the Nth Row</h3>
<p>Use <code>nth()</code> to extract any specific row by position:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nth</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div></div>
<p>This retrieves the 10th row in the dataset, useful when you need a specific observation by its position.</p>
</section>
</section>
<section id="using-position-functions-in-summaries" class="level2">
<h2 class="anchored" data-anchor-id="using-position-functions-in-summaries">Using Position Functions in Summaries</h2>
<section id="summarizing-with-last-values" class="level3">
<h3 class="anchored" data-anchor-id="summarizing-with-last-values">Summarizing with Last Values</h3>
<p>You can use position functions within <code>summarize()</code> to extract specific values:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">last_species =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">last</span>(species))</span></code></pre></div></div>
<p>This creates a summary showing only the species of the last penguin in the dataset.</p>
</section>
<section id="summarizing-with-first-values" class="level3">
<h3 class="anchored" data-anchor-id="summarizing-with-first-values">Summarizing with First Values</h3>
<p>The same approach works for first values:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">first_species =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">first</span>(species))</span></code></pre></div></div>
<p>This tells you which species appears first in your ordered dataset.</p>
</section>
</section>
<section id="slicing-rows-by-groups" class="level2">
<h2 class="anchored" data-anchor-id="slicing-rows-by-groups">Slicing Rows by Groups</h2>
<section id="getting-first-row-per-group" class="level3">
<h3 class="anchored" data-anchor-id="getting-first-row-per-group">Getting First Row per Group</h3>
<p>Combine <code>group_by()</code> with <code>slice_head()</code> to get the first observation from each group:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_head</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div></div>
<p>This returns one penguin from each species - specifically the first one that appears for each species in the data.</p>
</section>
<section id="getting-single-first-row" class="level3">
<h3 class="anchored" data-anchor-id="getting-single-first-row">Getting Single First Row</h3>
<p>Without grouping, <code>slice_head()</code> returns the first row(s) from the entire dataset:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_head</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div></div>
<p>This gives you just the very first penguin observation.</p>
</section>
<section id="getting-last-row-per-group" class="level3">
<h3 class="anchored" data-anchor-id="getting-last-row-per-group">Getting Last Row per Group</h3>
<p>Use <code>slice_tail()</code> with grouping to get the final observation from each group:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_tail</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div></div>
<p>This returns the last penguin observation for each of the three species.</p>
</section>
</section>
<section id="calculating-proportions" class="level2">
<h2 class="anchored" data-anchor-id="calculating-proportions">Calculating Proportions</h2>
<section id="basic-proportions" class="level3">
<h3 class="anchored" data-anchor-id="basic-proportions">Basic Proportions</h3>
<p>Calculate what proportion each group represents of the total:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prop =</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(n))</span></code></pre></div></div>
<p>This shows both the count and proportion of each penguin species in the dataset.</p>
</section>
<section id="alternative-proportion-calculation" class="level3">
<h3 class="anchored" data-anchor-id="alternative-proportion-calculation">Alternative Proportion Calculation</h3>
<p>You can also use <code>prop.table()</code> for the same result:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb12-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb12-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">freq =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prop.table</span>(n))</span></code></pre></div></div>
<p>Both approaches give you the relative frequency of each species as decimals that sum to 1.</p>
</section>
<section id="proportions-with-multiple-variables" class="level3">
<h3 class="anchored" data-anchor-id="proportions-with-multiple-variables">Proportions with Multiple Variables</h3>
<p>Calculate proportions across combinations of variables:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb13-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(species, sex) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb13-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prop =</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(n))</span></code></pre></div></div>
<p>This shows what proportion each species-sex combination represents of all penguins.</p>
</section>
</section>
<section id="working-with-consecutive-values" class="level2">
<h2 class="anchored" data-anchor-id="working-with-consecutive-values">Working with Consecutive Values</h2>
<section id="identifying-consecutive-groups" class="level3">
<h3 class="anchored" data-anchor-id="identifying-consecutive-groups">Identifying Consecutive Groups</h3>
<p>The <code>consecutive_id()</code> function helps identify runs of consecutive identical values:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data.frame</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb14-2">df</span></code></pre></div></div>
<p>Let’s create groups based on consecutive runs of the same x and y values:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">consecutive_id</span>(x, y), x, y)</span></code></pre></div></div>
<p>This assigns a unique ID to each consecutive run of identical x,y combinations.</p>
</section>
<section id="summarizing-consecutive-groups" class="level3">
<h3 class="anchored" data-anchor-id="summarizing-consecutive-groups">Summarizing Consecutive Groups</h3>
<p>Count how many observations are in each consecutive group:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">consecutive_id</span>(x, y), x, y) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb16-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>())</span></code></pre></div></div>
<p>This reveals that we have three distinct consecutive groups: two single observations and one pair of consecutive identical values.</p>
</section>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>These dplyr functions provide flexible ways to extract specific rows, work with grouped data, and calculate proportions. The <code>first()</code>, <code>last()</code>, and <code>nth()</code> functions are perfect for accessing specific observations, while <code>slice_head()</code> and <code>slice_tail()</code> work excellently with grouped data. Proportion calculations help you understand the relative frequency of different categories, and <code>consecutive_id()</code> is invaluable for identifying patterns in sequential data. Master these functions to make your data exploration and summarization tasks more efficient and insightful.</p>


<!-- -->

</section>

 ]]></description>
  <category>dplyr</category>
  <category>dplyr last</category>
  <guid>https://rstats101.com/dplyr/how-to-get-the-last-row-of-each-group-in-r.html</guid>
  <pubDate>Wed, 25 Mar 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/dplyr/last-row-of-each-group-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to get top and bottom rows of each group in R</title>
  <link>https://rstats101.com/dplyr/how-to-get-top-and-bottom-rows-of-each-group-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>The <code>slice_max()</code> function in dplyr is a powerful tool for selecting the top n rows with the highest values from your data. Unlike simple sorting, <code>slice_max()</code> lets you efficiently extract just the records you need, making it perfect for finding top performers, highest scores, or maximum values within groups. This function is especially useful when working with grouped data where you want to find the top entries for each category.</p>
</section>
<section id="getting-started" class="level2">
<h2 class="anchored" data-anchor-id="getting-started">Getting Started</h2>
<p>First, let’s load the tidyverse package and create some sample data to work with:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<p>We’ll create a dataset with symbols and values to demonstrate different uses of <code>slice_max()</code>:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2024</span>)</span>
<span id="cb2-2">df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb2-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">symbol =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(letters, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),</span>
<span id="cb2-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">value =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb2-5">)</span>
<span id="cb2-6">df</span></code></pre></div></div>
<p>This gives us a dataset with 10 random symbols and their corresponding numeric values, including both positive and negative numbers.</p>
</section>
<section id="basic-sorting-vs-slice_max" class="level2">
<h2 class="anchored" data-anchor-id="basic-sorting-vs-slice_max">Basic Sorting vs slice_max()</h2>
<p>Let’s first see what our data looks like when sorted by value:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(value)</span></code></pre></div></div>
<p>The <code>arrange()</code> function sorts all rows, but what if we only want the top 3 highest values? This is where <code>slice_max()</code> becomes useful.</p>
</section>
<section id="finding-top-values" class="level2">
<h2 class="anchored" data-anchor-id="finding-top-values">Finding Top Values</h2>
<p>To get the 3 rows with the highest values, we use <code>slice_max()</code>:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_max</span>(value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div></div>
<p>This returns only the top 3 rows with the highest values, which is much more efficient than sorting the entire dataset when you only need the top entries.</p>
</section>
<section id="working-with-grouped-data" class="level2">
<h2 class="anchored" data-anchor-id="working-with-grouped-data">Working with Grouped Data</h2>
<p><code>slice_max()</code> becomes even more powerful when combined with <code>group_by()</code>. Let’s first create a grouping variable:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">df_grouped <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">direction =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(value <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"positive"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"negative"</span>))</span>
<span id="cb5-3">df_grouped</span></code></pre></div></div>
<p>Now we can find the top values within each group:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">df_grouped <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(direction) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_max</span>(value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div></div>
<p>This gives us the top 2 highest values for both positive and negative numbers separately.</p>
</section>
<section id="handling-ties" class="level2">
<h2 class="anchored" data-anchor-id="handling-ties">Handling Ties</h2>
<p>When there are tied values, <code>slice_max()</code> includes all tied observations by default. You can control this behavior:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Include all ties (default)</span></span>
<span id="cb7-2">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_max</span>(value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb7-3"></span>
<span id="cb7-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Keep exactly n rows, breaking ties randomly</span></span>
<span id="cb7-5">df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_max</span>(value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">with_ties =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span></code></pre></div></div>
<p>The <code>with_ties</code> parameter determines whether to include all rows that tie for the nth position.</p>
</section>
<section id="practical-example-with-real-data" class="level2">
<h2 class="anchored" data-anchor-id="practical-example-with-real-data">Practical Example with Real Data</h2>
<p>Let’s use the penguins data to see a more realistic example:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(palmerpenguins)</span>
<span id="cb8-2"></span>
<span id="cb8-3">penguins <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(body_mass_g)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(species) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_max</span>(body_mass_g, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div></div>
<p>This finds the 2 heaviest penguins of each species, which is useful for understanding the size distribution across different penguin types.</p>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p><code>slice_max()</code> is an efficient way to extract the top n rows based on a specific variable, especially when you don’t need to sort your entire dataset. It works particularly well with grouped data, allowing you to find top values within each category. Remember to handle missing values appropriately and consider the <code>with_ties</code> parameter when exact row counts matter. This function is a great alternative to combining <code>arrange()</code> and <code>head()</code>, providing cleaner and more intuitive code for common data analysis tasks.</p>


<!-- -->

</section>

 ]]></description>
  <category>dplyr</category>
  <category>dplyr slice</category>
  <guid>https://rstats101.com/dplyr/how-to-get-top-and-bottom-rows-of-each-group-in-r.html</guid>
  <pubDate>Wed, 25 Mar 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/dplyr/top-and-bottom-rows-of-each-group-in-r-hero-ggplot.png" medium="image" type="image/png" height="88" width="144"/>
</item>
<item>
  <title>How to update rows in a dataframe with rows_update() in R</title>
  <link>https://rstats101.com/dplyr/how-to-update-rows-in-a-dataframe-with-rowsupdate-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>The <code>rows_update()</code> function in dplyr provides a powerful way to update existing rows in a data frame using values from another data frame. Similar to SQL’s UPDATE statement, it modifies rows where key columns match between two tables, making it ideal for applying corrections, updates, or changes to specific records without altering the overall structure of your data.</p>
</section>
<section id="setting-up" class="level2">
<h2 class="anchored" data-anchor-id="setting-up">Setting Up</h2>
<p>Let’s start by loading the required packages and checking our dplyr version:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">packageVersion</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dplyr"</span>)</span></code></pre></div></div>
</section>
<section id="basic-syntax-and-parameters" class="level2">
<h2 class="anchored" data-anchor-id="basic-syntax-and-parameters">Basic Syntax and Parameters</h2>
<p>The <code>rows_update()</code> function follows this general syntax:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rows_update</span>(</span>
<span id="cb2-2">  x,                    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># target data frame to update</span></span>
<span id="cb2-3">  y,                    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># data frame with new values</span></span>
<span id="cb2-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># key columns for matching</span></span>
<span id="cb2-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">unmatched =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"error"</span>,  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># how to handle unmatched keys</span></span>
<span id="cb2-6">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">copy =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>,</span>
<span id="cb2-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">in_place =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span></span>
<span id="cb2-8">)</span></code></pre></div></div>
<p>The function requires that key values in the update data frame (<code>y</code>) are unique and, by default, must exist in the target data frame (<code>x</code>).</p>
</section>
<section id="basic-example-updating-student-scores" class="level2">
<h2 class="anchored" data-anchor-id="basic-example-updating-student-scores">Basic Example: Updating Student Scores</h2>
<p>Let’s create a practical example with student data where we need to update some test scores:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Original student data</span></span>
<span id="cb3-2">students <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb3-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">student_id =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>,</span>
<span id="cb3-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Alice"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Bob"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Charlie"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Liz"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sam"</span>),</span>
<span id="cb3-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">score =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">85</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">90</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">88</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">92</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">89</span>)</span>
<span id="cb3-6">)</span></code></pre></div></div>
<p>Now let’s create a data frame with updated scores for specific students:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Updated scores for students 1 and 5</span></span>
<span id="cb4-2">score_updates <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb4-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">student_id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>),</span>
<span id="cb4-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">score =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">98</span>)</span>
<span id="cb4-5">)</span></code></pre></div></div>
<p>Let’s examine our original data first:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">students</span></code></pre></div></div>
<p>And our update data:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">score_updates</span></code></pre></div></div>
</section>
<section id="performing-the-update" class="level2">
<h2 class="anchored" data-anchor-id="performing-the-update">Performing the Update</h2>
<p>Now we can update the original data frame with the new scores:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">updated_students <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> students <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rows_update</span>(score_updates)</span></code></pre></div></div>
<p>Since both data frames share the <code>student_id</code> column, <code>rows_update()</code> automatically uses it as the key. Notice that only Alice’s and Sam’s scores changed to 100 and 98 respectively, while other rows remained unchanged.</p>
</section>
<section id="specifying-key-columns-explicitly" class="level2">
<h2 class="anchored" data-anchor-id="specifying-key-columns-explicitly">Specifying Key Columns Explicitly</h2>
<p>You can explicitly specify which column(s) to use for matching using the <code>by</code> parameter:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">students <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rows_update</span>(score_updates, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"student_id"</span>)</span></code></pre></div></div>
<p>This produces the same result but makes the key column explicit, which is helpful when working with complex data or when column names might be ambiguous.</p>
</section>
<section id="handling-extra-columns" class="level2">
<h2 class="anchored" data-anchor-id="handling-extra-columns">Handling Extra Columns</h2>
<p>What happens when the update data frame contains additional columns not present in the original data? Let’s see:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Update data with an extra column</span></span>
<span id="cb9-2">updates_with_grade <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb9-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">student_id =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>),</span>
<span id="cb9-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">score =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">98</span>),</span>
<span id="cb9-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">grade =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"A"</span>)</span>
<span id="cb9-6">)</span></code></pre></div></div>
<p>If we try to update with this data frame containing the extra <code>grade</code> column:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># This will cause an error</span></span>
<span id="cb10-2">students <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rows_update</span>(updates_with_grade, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"student_id"</span>)</span></code></pre></div></div>
<p>This will produce an error because <code>rows_update()</code> only updates existing columns and doesn’t add new ones. The original data frame structure is preserved.</p>
</section>
<section id="key-points-and-best-practices" class="level2">
<h2 class="anchored" data-anchor-id="key-points-and-best-practices">Key Points and Best Practices</h2>
<p>When using <code>rows_update()</code>, remember these important points:</p>
<ul>
<li>Key values in the update data frame must be unique</li>
<li>Only existing columns in the target data frame will be updated</li>
<li>The function preserves the original structure and row order</li>
<li>Unmatched keys in the update data frame cause an error by default</li>
</ul>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>The <code>rows_update()</code> function is an essential tool for maintaining and updating data frames in a controlled manner. It provides a safe way to apply updates to specific rows while preserving data integrity and structure. Use it when you need to modify existing records without adding new rows or columns, making it perfect for data correction workflows and incremental updates.</p>


<!-- -->

</section>

 ]]></description>
  <category>dplyr</category>
  <category>dplyr rows_update()</category>
  <guid>https://rstats101.com/dplyr/how-to-update-rows-in-a-dataframe-with-rowsupdate-in-r.html</guid>
  <pubDate>Wed, 25 Mar 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/dplyr/rows-update-in-r-hero-ggplot.png" medium="image" type="image/png" height="72" width="144"/>
</item>
<item>
  <title>How to use anti_join() to find non-matching rows in R</title>
  <link>https://rstats101.com/dplyr/how-to-use-antijoin-to-find-non-matching-rows-in-r.html</link>
  <description><![CDATA[ 




<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>The <code>anti_join()</code> function in R’s dplyr package helps you find rows that exist in one dataset but not in another. This filtering join is particularly useful for identifying missing records, finding unique observations, or cleaning data by removing unwanted matches. Use <code>anti_join()</code> when you want to exclude rows based on matching keys between two datasets.</p>
</section>
<section id="setting-up-the-data" class="level2">
<h2 class="anchored" data-anchor-id="setting-up-the-data">Setting Up the Data</h2>
<p>Let’s start by loading the tidyverse package and creating two sample datasets to demonstrate <code>anti_join()</code>.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<p>We’ll create our first dataset containing R packages with their corresponding IDs:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>)</span>
<span id="cb2-2">id <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb2-3">df1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb2-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id =</span> id,</span>
<span id="cb2-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">packages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dplyr"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"tidyr"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"tibble"</span>)</span>
<span id="cb2-6">)</span>
<span id="cb2-7">df1</span></code></pre></div></div>
<p>This creates a tibble with three R packages, each assigned a random ID. The <code>set.seed()</code> ensures reproducible results.</p>
<p>Now let’s create a second dataset with some overlapping packages but different IDs:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">123</span>)</span>
<span id="cb3-2">id <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb3-3">df2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb3-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">id =</span> id,</span>
<span id="cb3-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">packages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dplyr"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ggplot2"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"tidyr"</span>)</span>
<span id="cb3-6">)</span>
<span id="cb3-7">df2</span></code></pre></div></div>
<p>Notice that df2 contains “dplyr” and “tidyr” (also in df1) plus “ggplot2” (unique to df2).</p>
</section>
<section id="finding-rows-in-df1-but-not-in-df2" class="level2">
<h2 class="anchored" data-anchor-id="finding-rows-in-df1-but-not-in-df2">Finding Rows in df1 but Not in df2</h2>
<p>To find packages that exist in df1 but not in df2, we use <code>anti_join()</code> with df1 as the first argument:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">anti_join</span>(df1, df2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"packages"</span>)</span></code></pre></div></div>
<p>This returns rows from df1 where the package name doesn’t have a match in df2. Since “dplyr” and “tidyr” appear in both datasets, only “tibble” (unique to df1) will be returned.</p>
</section>
<section id="finding-rows-in-df2-but-not-in-df1" class="level2">
<h2 class="anchored" data-anchor-id="finding-rows-in-df2-but-not-in-df1">Finding Rows in df2 but Not in df1</h2>
<p>We can reverse the operation to find packages in df2 that aren’t in df1:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">anti_join</span>(df2, df1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"packages"</span>)</span></code></pre></div></div>
<p>This returns “ggplot2” since it’s the only package in df2 that doesn’t appear in df1.</p>
</section>
<section id="understanding-the-join-key" class="level2">
<h2 class="anchored" data-anchor-id="understanding-the-join-key">Understanding the Join Key</h2>
<p>When the column names are the same (like “packages” in both datasets), <code>anti_join()</code> automatically uses them as the join key:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># These are equivalent</span></span>
<span id="cb6-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">anti_join</span>(df1, df2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"packages"</span>)</span>
<span id="cb6-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">anti_join</span>(df1, df2)  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># automatic matching</span></span></code></pre></div></div>
<p>The function automatically detects matching column names and uses them for the join operation.</p>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>The <code>anti_join()</code> function is a powerful tool for identifying non-matching records between datasets. Remember that the order matters: <code>anti_join(A, B)</code> returns rows from A that don’t match B, while <code>anti_join(B, A)</code> returns rows from B that don’t match A. This makes it invaluable for data cleaning, validation, and finding unique observations across related datasets.</p>


<!-- -->

</section>

 ]]></description>
  <category>dplyr</category>
  <category>dplyr anti_join()</category>
  <guid>https://rstats101.com/dplyr/how-to-use-antijoin-to-find-non-matching-rows-in-r.html</guid>
  <pubDate>Wed, 25 Mar 2026 18:30:00 GMT</pubDate>
  <media:content url="https://rstats101.com/images/dplyr/antijoin-in-r-hero-ggplot.png" medium="image" type="image/png" height="64" width="144"/>
</item>
</channel>
</rss>
