Enrich your data at scale with LLM APIs

data-analysisintermediateproven

The problem

You've got 500 donor records that need categorising, or 200 feedback responses to analyse, or 300 organisation names to standardise. Copy-pasting each one to Claude.ai would take days. You need to process them all programmatically, but you're not sure how to call AI from code. This is the fundamental pattern that unlocks AI at scale.

The solution

Write a simple script that loops through your CSV file, calls an LLM API (Claude, GPT) for each row with a consistent prompt, and saves the enriched results back to a spreadsheet. Start with 5 rows to test, then run the full batch. This is the 'boiling water' of programmatic AI - the most basic pattern you'll use constantly.

What you get

An enriched CSV file with new columns containing AI-generated insights. For example: original 'donation_description' column plus new 'inferred_category', 'donor_motivation', 'suggested_follow_up' columns. Typically processes 100-1000 records in 10-30 minutes, depending on complexity and rate limits.

Before you start

CSV file with data to enrich (works best with 100-1000 rows)
API key from OpenAI or Anthropic (£5-10 credit to start)
Clear prompt: what do you want the AI to do for each row?
Basic willingness to run Python or Node code (we'll provide working examples)
Budget for API costs: roughly £0.01-0.05 per record depending on complexity
DATA PROTECTION: If enriching donor or beneficiary data, check your charity's privacy policy before sending to US-based APIs. Consider whether a DPIA (Data Protection Impact Assessment) is needed. Anonymise or remove names/emails if the AI only needs the content to analyse. API data is typically not used for training, but verify with your provider.

When to use this

You have 100+ records that need the same AI processing applied
The task is repetitive and well-defined (categorise, extract, summarise, standardise)
Manual processing would take hours or days
You can tolerate 90-95% accuracy (with human spot-checking)
The data doesn't contain highly sensitive information (or you've anonymized it)

When not to use this

Fewer than 100 records (manual or copy-paste is faster)
Each record needs different handling (not a repetitive task)
You need 100% accuracy (LLMs make mistakes - budget for human review)
Highly sensitive data that can't be sent to external APIs
You haven't tested the prompt on a few examples first (always test before bulk running)
No budget for API costs (even cheap models cost money at scale)

Steps

1
Get an API key and test it
Sign up for OpenAI (platform.openai.com) or Anthropic (console.anthropic.com). Add £5-10 credit to your account. Generate an API key. Test it works: run the example code below with just one test row. If you get a response, you're ready. If errors, check your API key is correct and you have credit.
2
Prepare your CSV and write your prompt
Export your data to CSV. Identify which column(s) the AI should read, and what new column(s) you want to create. Write a clear prompt: 'Given this donation description: [description], infer the category (general donation, emergency appeal, specific programme) and the donor motivation (regular supporter, first-time giver, event attendee). Return JSON: {category: string, motivation: string}'. Test this prompt manually with 3-5 examples in Claude.ai/ChatGPT to make sure it works.
3
Set up your code environment
Easiest: Open Google Colab (free, runs in browser, no installation). Alternative: Install Python locally. Upload your CSV to Colab. Copy the example code below and update it with: (1) Your API key, (2) Your CSV filename, (3) Your prompt, (4) Column names. The code reads CSV, loops through rows, calls API, saves results.
4
Test on first 5 rows only
Modify the code to process only the first 5 rows (there's a limit variable in the example). Run it. Check the results carefully: Is the AI doing what you expect? Are the outputs in the right format? Any errors? If results look good, proceed. If not, refine your prompt and test again. Never run 1000 rows without testing on 5 first.
5
Run the full batch with monitoring
Remove the row limit and run on your full dataset. The code will show progress (e.g., 'Processing row 50/500'). It handles rate limits by adding small delays between calls. For 500 rows this typically takes 10-30 minutes. Don't close your browser/laptop. When complete, download the enriched CSV.
6
Review results and spot-check accuracy
Open the enriched CSV. Spot-check 20-30 random rows: are the AI outputs accurate? Common issues: AI hallucinates when data is ambiguous, formatting inconsistencies, occasional nonsense. Calculate rough accuracy: if 25/30 spot-checks are correct, you're at ~83% accuracy. Decide if that's acceptable or if you need to refine the prompt and re-run.
7
Handle errors and re-run failed rows(optional)
Some rows might have failed (API timeouts, rate limits, malformed data). The code saves an error log. For failed rows: check why they failed, fix if possible (e.g., remove special characters), re-run just those rows. Don't discard them - failures often indicate interesting edge cases worth investigating.

Example code

Enrich CSV data using OpenAI API (Python)

Basic pattern for enriching CSV data with OpenAI. Install: pip install openai pandas tqdm

# Install required packages (run this in Colab first)
# !pip install openai pandas tqdm

import pandas as pd
from openai import OpenAI
import time
from tqdm import tqdm

# Configuration - using OpenAI SDK v1.0.0+ syntax
API_KEY = 'your-api-key-here'  # Replace with your actual API key
CSV_INPUT = 'donations.csv'     # Your input CSV
CSV_OUTPUT = 'donations_enriched.csv'  # Output with new columns
COLUMN_TO_PROCESS = 'description'  # Which column to send to AI

# Your prompt template
def create_prompt(description):
    return f"""Given this donation description: "{description}"

Extract the following information and return as JSON:
- category: one of [general_donation, emergency_appeal, specific_programme, legacy, other]
- donor_motivation: inferred reason for giving (1-2 words)
- suggested_follow_up: recommended next action (1 sentence)

Return ONLY valid JSON, no other text."""

# Initialize OpenAI client
client = OpenAI(api_key=API_KEY)

# Load data
df = pd.read_csv(CSV_INPUT)

# For testing: limit to first 5 rows
# Remove this line when ready to run full batch
df = df.head(5)

# Add columns for results
df['ai_category'] = None
df['ai_motivation'] = None
df['ai_follow_up'] = None
df['ai_error'] = None

# Process each row
for idx, row in tqdm(df.iterrows(), total=len(df), desc="Enriching data"):
    try:
        description = row[COLUMN_TO_PROCESS]

        # Skip if description is empty
        if pd.isna(description) or str(description).strip() == '':
            df.at[idx, 'ai_error'] = 'Empty description'
            continue

        # Call OpenAI API (v1.0.0+ syntax)
        response = client.chat.completions.create(
            model='gpt-4o-mini',  # Cheap and fast
            messages=[
                {'role': 'user', 'content': create_prompt(description)}
            ],
            temperature=0.3,  # Lower = more consistent
            max_tokens=200
        )

        # Parse response
        result_text = response.choices[0].message.content

        # Try to parse as JSON
        import json
        result = json.loads(result_text)

        # Extract fields
        df.at[idx, 'ai_category'] = result.get('category', '')
        df.at[idx, 'ai_motivation'] = result.get('donor_motivation', '')
        df.at[idx, 'ai_follow_up'] = result.get('suggested_follow_up', '')

        # Rate limiting: be nice to the API
        time.sleep(0.5)  # 500ms between calls

    except Exception as e:
        df.at[idx, 'ai_error'] = str(e)
        print(f"Error on row {idx}: {e}")
        continue

# Save enriched data
df.to_csv(CSV_OUTPUT, index=False)
print(f"\nDone! Saved to {CSV_OUTPUT}")
print(f"Successfully processed: {df['ai_error'].isna().sum()} rows")
print(f"Errors: {df['ai_error'].notna().sum()} rows")

# Show summary
print("\nSample results:")
print(df[['description', 'ai_category', 'ai_motivation']].head(10))

Enrich CSV data using Anthropic Claude API

Same pattern using Anthropic Claude. Install: pip install anthropic pandas tqdm

# Install required packages (run this in Colab first)
# !pip install anthropic pandas tqdm

import pandas as pd
import anthropic
import time
from tqdm import tqdm

# Configuration
API_KEY = 'your-anthropic-api-key-here'
CSV_INPUT = 'feedback.csv'
CSV_OUTPUT = 'feedback_analyzed.csv'
COLUMN_TO_PROCESS = 'feedback_text'

def create_prompt(text):
    return f"""Analyze this feedback: "{text}"

Extract:
1. Sentiment: positive, negative, or neutral
2. Main theme: one of [service_quality, staff_attitude, accessibility, outcomes, other]
3. Key concern: the main issue or praise in one sentence

Return as JSON: {{"sentiment": "...", "theme": "...", "concern": "..."}}"""

# Initialize Claude
client = anthropic.Anthropic(api_key=API_KEY)

# Load data
df = pd.read_csv(CSV_INPUT)

# Test mode: process first 5 only
df = df.head(5)

# Add result columns
df['sentiment'] = None
df['theme'] = None
df['concern'] = None
df['error'] = None

# Process rows
for idx, row in tqdm(df.iterrows(), total=len(df)):
    try:
        text = row[COLUMN_TO_PROCESS]

        if pd.isna(text) or str(text).strip() == '':
            df.at[idx, 'error'] = 'Empty text'
            continue

        # Call Claude
        message = client.messages.create(
            model="claude-3-5-haiku-20241022",  # Fast and cheap
            max_tokens=200,
            temperature=0.3,
            messages=[
                {"role": "user", "content": create_prompt(text)}
            ]
        )

        # Parse response
        import json
        result = json.loads(message.content[0].text)

        df.at[idx, 'sentiment'] = result.get('sentiment', '')
        df.at[idx, 'theme'] = result.get('theme', '')
        df.at[idx, 'concern'] = result.get('concern', '')

        time.sleep(0.5)

    except Exception as e:
        df.at[idx, 'error'] = str(e)
        print(f"Error on row {idx}: {e}")

# Save
df.to_csv(CSV_OUTPUT, index=False)
print(f"\nProcessed {df['error'].isna().sum()} rows successfully")
print(f"Errors: {df['error'].notna().sum()} rows")

Tools

OpenAI APIservice · paid

Visit →

Anthropic Claude APIservice · paid

Visit →

Pythonplatform · free · open source

Visit →

Google Colabplatform · free

Visit →

Resources

OpenAI API documentationdocumentation

Official guide to calling GPT models programmatically.

Anthropic Claude API documentationdocumentation

Official guide to calling Claude models programmatically.

Google Colab for beginnerstutorial

Free hosted Python environment - no installation needed.

API rate limits and costsdocumentation

Understanding rate limits and optimizing API costs.

At a glance

Time to implement: hours
Setup cost: low
Ongoing cost: low
Cost trend: decreasing
Organisation size: micro, small, medium, large
Target audience: operations-manager, data-analyst, fundraising, program-delivery

API costs are primary expense. GPT-4o-mini: ~£0.01 per record. Claude Haiku: ~£0.01 per record. GPT-4: ~£0.10 per record (overkill for most tasks). Start with cheaper models and upgrade only if quality insufficient. 500 records typically costs £5-25 depending on model and prompt complexity.

Part of this pathway

API loops and programmatic AI

This recipe is in the "The basic loop" stage

intermediate4 stages

The problem

The solution

What you get

Before you start

When to use this

When not to use this

Steps

Get an API key and test it

Prepare your CSV and write your prompt

Set up your code environment

Test on first 5 rows only

Run the full batch with monitoring

Review results and spot-check accuracy

Handle errors and re-run failed rows(optional)

Example code

Enrich CSV data using OpenAI API (Python)

Enrich CSV data using Anthropic Claude API

Tools

Resources

At a glance

Part of this pathway