Enrich your data at scale with LLM APIs
The problem
You've got 500 donor records that need categorising, or 200 feedback responses to analyse, or 300 organisation names to standardise. Copy-pasting each one to Claude.ai would take days. You need to process them all programmatically, but you're not sure how to call AI from code. This is the fundamental pattern that unlocks AI at scale.
The solution
Write a simple script that loops through your CSV file, calls an LLM API (Claude, GPT) for each row with a consistent prompt, and saves the enriched results back to a spreadsheet. Start with 5 rows to test, then run the full batch. This is the 'boiling water' of programmatic AI - the most basic pattern you'll use constantly.
What you get
An enriched CSV file with new columns containing AI-generated insights. For example: original 'donation_description' column plus new 'inferred_category', 'donor_motivation', 'suggested_follow_up' columns. Typically processes 100-1000 records in 10-30 minutes, depending on complexity and rate limits.
Before you start
- CSV file with data to enrich (works best with 100-1000 rows)
- API key from OpenAI or Anthropic (£5-10 credit to start)
- Clear prompt: what do you want the AI to do for each row?
- Basic willingness to run Python or Node code (we'll provide working examples)
- Budget for API costs: roughly £0.01-0.05 per record depending on complexity
When to use this
- You have 100+ records that need the same AI processing applied
- The task is repetitive and well-defined (categorise, extract, summarise, standardise)
- Manual processing would take hours or days
- You can tolerate 90-95% accuracy (with human spot-checking)
- The data doesn't contain highly sensitive information (or you've anonymized it)
When not to use this
- Fewer than 100 records (manual or copy-paste is faster)
- Each record needs different handling (not a repetitive task)
- You need 100% accuracy (LLMs make mistakes - budget for human review)
- Highly sensitive data that can't be sent to external APIs
- You haven't tested the prompt on a few examples first (always test before bulk running)
- No budget for API costs (even cheap models cost money at scale)
Steps
- 1
Get an API key and test it
Sign up for OpenAI (platform.openai.com) or Anthropic (console.anthropic.com). Add £5-10 credit to your account. Generate an API key. Test it works: run the example code below with just one test row. If you get a response, you're ready. If errors, check your API key is correct and you have credit.
- 2
Prepare your CSV and write your prompt
Export your data to CSV. Identify which column(s) the AI should read, and what new column(s) you want to create. Write a clear prompt: 'Given this donation description: [description], infer the category (general donation, emergency appeal, specific programme) and the donor motivation (regular supporter, first-time giver, event attendee). Return JSON: {category: string, motivation: string}'. Test this prompt manually with 3-5 examples in Claude.ai/ChatGPT to make sure it works.
- 3
Set up your code environment
Easiest: Open Google Colab (free, runs in browser, no installation). Alternative: Install Python locally. Upload your CSV to Colab. Copy the example code below and update it with: (1) Your API key, (2) Your CSV filename, (3) Your prompt, (4) Column names. The code reads CSV, loops through rows, calls API, saves results.
- 4
Test on first 5 rows only
Modify the code to process only the first 5 rows (there's a limit variable in the example). Run it. Check the results carefully: Is the AI doing what you expect? Are the outputs in the right format? Any errors? If results look good, proceed. If not, refine your prompt and test again. Never run 1000 rows without testing on 5 first.
- 5
Run the full batch with monitoring
Remove the row limit and run on your full dataset. The code will show progress (e.g., 'Processing row 50/500'). It handles rate limits by adding small delays between calls. For 500 rows this typically takes 10-30 minutes. Don't close your browser/laptop. When complete, download the enriched CSV.
- 6
Review results and spot-check accuracy
Open the enriched CSV. Spot-check 20-30 random rows: are the AI outputs accurate? Common issues: AI hallucinates when data is ambiguous, formatting inconsistencies, occasional nonsense. Calculate rough accuracy: if 25/30 spot-checks are correct, you're at ~83% accuracy. Decide if that's acceptable or if you need to refine the prompt and re-run.
- 7
Handle errors and re-run failed rows(optional)
Some rows might have failed (API timeouts, rate limits, malformed data). The code saves an error log. For failed rows: check why they failed, fix if possible (e.g., remove special characters), re-run just those rows. Don't discard them - failures often indicate interesting edge cases worth investigating.
Example code
Enrich CSV data using OpenAI API (Python)
Basic pattern for enriching CSV data with OpenAI. Install: pip install openai pandas tqdm
import pandas as pd
import openai
import time
import os
from tqdm import tqdm
# Configuration
API_KEY = 'your-api-key-here' # Replace with your actual API key
CSV_INPUT = 'donations.csv' # Your input CSV
CSV_OUTPUT = 'donations_enriched.csv' # Output with new columns
COLUMN_TO_PROCESS = 'description' # Which column to send to AI
# Your prompt template
def create_prompt(description):
return f"""Given this donation description: "{description}"
Extract the following information and return as JSON:
- category: one of [general_donation, emergency_appeal, specific_programme, legacy, other]
- donor_motivation: inferred reason for giving (1-2 words)
- suggested_follow_up: recommended next action (1 sentence)
Return ONLY valid JSON, no other text."""
# Initialize OpenAI
openai.api_key = API_KEY
# Load data
df = pd.read_csv(CSV_INPUT)
# For testing: limit to first 5 rows
# Remove this line when ready to run full batch
df = df.head(5)
# Add columns for results
df['ai_category'] = None
df['ai_motivation'] = None
df['ai_follow_up'] = None
df['ai_error'] = None
# Process each row
for idx, row in tqdm(df.iterrows(), total=len(df), desc="Enriching data"):
try:
description = row[COLUMN_TO_PROCESS]
# Skip if description is empty
if pd.isna(description) or str(description).strip() == '':
df.at[idx, 'ai_error'] = 'Empty description'
continue
# Call OpenAI API
response = openai.ChatCompletion.create(
model='gpt-4o-mini', # Cheap and fast
messages=[
{'role': 'user', 'content': create_prompt(description)}
],
temperature=0.3, # Lower = more consistent
max_tokens=200
)
# Parse response
result_text = response.choices[0].message.content
# Try to parse as JSON
import json
result = json.loads(result_text)
# Extract fields
df.at[idx, 'ai_category'] = result.get('category', '')
df.at[idx, 'ai_motivation'] = result.get('donor_motivation', '')
df.at[idx, 'ai_follow_up'] = result.get('suggested_follow_up', '')
# Rate limiting: be nice to the API
time.sleep(0.5) # 500ms between calls
except Exception as e:
df.at[idx, 'ai_error'] = str(e)
print(f"Error on row {idx}: {e}")
continue
# Save enriched data
df.to_csv(CSV_OUTPUT, index=False)
print(f"\nDone! Saved to {CSV_OUTPUT}")
print(f"Successfully processed: {df['ai_error'].isna().sum()} rows")
print(f"Errors: {df['ai_error'].notna().sum()} rows")
# Show summary
print("\nSample results:")
print(df[['description', 'ai_category', 'ai_motivation']].head(10))Enrich CSV data using Anthropic Claude API
Same pattern using Anthropic Claude. Install: pip install anthropic pandas tqdm
import pandas as pd
import anthropic
import time
from tqdm import tqdm
# Configuration
API_KEY = 'your-anthropic-api-key-here'
CSV_INPUT = 'feedback.csv'
CSV_OUTPUT = 'feedback_analyzed.csv'
COLUMN_TO_PROCESS = 'feedback_text'
def create_prompt(text):
return f"""Analyze this feedback: "{text}"
Extract:
1. Sentiment: positive, negative, or neutral
2. Main theme: one of [service_quality, staff_attitude, accessibility, outcomes, other]
3. Key concern: the main issue or praise in one sentence
Return as JSON: {{"sentiment": "...", "theme": "...", "concern": "..."}}"""
# Initialize Claude
client = anthropic.Anthropic(api_key=API_KEY)
# Load data
df = pd.read_csv(CSV_INPUT)
# Test mode: process first 5 only
df = df.head(5)
# Add result columns
df['sentiment'] = None
df['theme'] = None
df['concern'] = None
df['error'] = None
# Process rows
for idx, row in tqdm(df.iterrows(), total=len(df)):
try:
text = row[COLUMN_TO_PROCESS]
if pd.isna(text) or str(text).strip() == '':
df.at[idx, 'error'] = 'Empty text'
continue
# Call Claude
message = client.messages.create(
model="claude-3-5-haiku-20241022", # Fast and cheap
max_tokens=200,
temperature=0.3,
messages=[
{"role": "user", "content": create_prompt(text)}
]
)
# Parse response
import json
result = json.loads(message.content[0].text)
df.at[idx, 'sentiment'] = result.get('sentiment', '')
df.at[idx, 'theme'] = result.get('theme', '')
df.at[idx, 'concern'] = result.get('concern', '')
time.sleep(0.5)
except Exception as e:
df.at[idx, 'error'] = str(e)
print(f"Error on row {idx}: {e}")
# Save
df.to_csv(CSV_OUTPUT, index=False)
print(f"\nProcessed {df['error'].isna().sum()} rows successfully")
print(f"Errors: {df['error'].notna().sum()} rows")Tools
Resources
Official guide to calling GPT models programmatically.
Anthropic Claude API documentationdocumentationOfficial guide to calling Claude models programmatically.
Google Colab for beginnerstutorialFree hosted Python environment - no installation needed.
API rate limits and costsdocumentationUnderstanding rate limits and optimizing API costs.
At a glance
- Time to implement
- hours
- Setup cost
- low
- Ongoing cost
- low
- Cost trend
- decreasing
- Organisation size
- micro, small, medium, large
- Target audience
- operations-manager, data-analyst, fundraising, program-delivery
API costs are primary expense. GPT-4o-mini: ~£0.01 per record. Claude Haiku: ~£0.01 per record. GPT-4: ~£0.10 per record (overkill for most tasks). Start with cheaper models and upgrade only if quality insufficient. 500 records typically costs £5-25 depending on model and prompt complexity.