← Back to recipes

Process spreadsheet data with Claude Code

data-analysisintermediateemerging

The problem

You have spreadsheets of data that need processing - categorising text responses, cleaning inconsistent entries, extracting information from free-text fields. You've tried copying data into ChatGPT but hit paste limits. You've seen Python-based recipes but the setup feels daunting.

The solution

Use Claude Code to write and run data processing scripts for you. Describe what you need in plain English, and Claude will generate the code, run it, debug any errors, and produce your output file. You don't need to understand the code - Claude handles that. Upload your CSV, describe the task, download the results.

What you get

A processed CSV or Excel file with your data categorised, cleaned, or transformed as specified. Plus a reusable script you can run again on future data. Claude explains what it did so you can verify the approach makes sense.

Before you start

  • Claude Code set up (see setup recipes) - requires comfort with command line/terminal
  • Your data exported as CSV (one row per record)
  • Basic understanding of what you want to do with the data
  • Anthropic API credit (typically £0.50-£2 per 1000 rows processed - check your billing dashboard after test batch to verify costs)
  • DATA PROTECTION: If your data contains PII (names, addresses, beneficiary details), anonymise before processing or ensure you have appropriate legal basis. Never upload sensitive beneficiary data without a DPIA. Check your charity privacy policy covers AI processing.

When to use this

  • You have more than 100 rows but less than 50,000 (larger datasets need chunking)
  • The processing involves understanding text (categorisation, extraction, summarisation)
  • You need to apply consistent logic that's hard to express as a formula
  • You want to repeat this process on future data
  • You've tried the paste-into-ChatGPT approach and hit limits

When not to use this

  • Your data is under 50 rows (just paste into Claude.ai directly)
  • The processing is purely numerical (use Excel formulas)
  • The data is highly sensitive and can't go through any API
  • You need real-time processing (this is for batch jobs)
  • A simple find-and-replace would do the job

Steps

  1. 1

    Prepare your data file

    Export your data as a CSV file. Make sure the first row contains column headers that describe what's in each column. Remove any rows that shouldn't be processed (totals, notes, etc). Save it in your working directory where you'll run Claude Code (this is a CLI tool you run from your terminal - if using GitHub Codespaces, upload to your Codespace folder). PRIVACY CHECK: Review the data - if it contains real names, addresses, or beneficiary details, consider whether you should anonymise before processing.

  2. 2

    Describe your processing task clearly

    Before starting, write down what you want in plain English. Be specific: 'Categorise each feedback response into one of these themes: service quality, staff behaviour, facilities, waiting times, other. Also flag any that seem urgent.' The clearer you are, the better Claude's first attempt will be.

  3. 3

    Show Claude a sample first

    Start by asking Claude to read the first 10-20 rows and describe what it sees. Say: 'Read my_data.csv and show me the first 10 rows. Tell me what you think each column contains.' This catches any data format issues early and helps Claude understand your data.

  4. 4

    Request the processing with a small test

    Ask Claude to process a small batch first: 'Process the first 50 rows of this data. For each row, [your task description]. Save the results to a new file called processed_sample.csv with all original columns plus new columns for [your outputs].' Review this sample before running the full dataset. After running, check your Anthropic Console billing dashboard to verify costs before scaling up.

  5. 5

    Review and refine

    Open the sample output and check several rows. Is the categorisation accurate? Are there edge cases Claude handled poorly? If something's wrong, tell Claude specifically: 'The third row was categorised as X but should be Y because Z. Please update the logic and reprocess.' Iterate until it's working.

  6. 6

    Run on the full dataset

    Once satisfied with the sample, ask Claude to process everything: 'Now process the full dataset using the same approach. Show me progress every 100 rows.' For large files, Claude will handle batching automatically. It may take several minutes for thousands of rows.

  7. 7

    Save the script for reuse

    Ask Claude: 'Save the processing script to a file called process_feedback.py so I can run it again on future data.' Claude will extract the code into a standalone file. Next time you have similar data, you can run the script directly or ask Claude to adapt it.

  8. 8

    Document what you did

    Ask Claude: 'Write a brief README explaining what this script does, what format the input data should be in, and how to run it.' This helps your colleagues (and future you) understand the process without having to dig through code.

Example code

Example prompt for categorising feedback

A sample prompt you might give Claude Code for processing feedback data.

Read feedback_responses.csv and process each row:

1. The 'response_text' column contains free-text feedback from service users
2. For each response, add these new columns:
   - theme: one of 'service_quality', 'staff', 'facilities', 'access', 'communication', 'other'
   - sentiment: 'positive', 'negative', or 'mixed'
   - urgent: true if the response mentions immediate safety concerns, complaints about named staff, or requests urgent callback
   - summary: a 10-word summary of the main point

3. Process the first 20 rows as a sample and show me the results
4. Save to feedback_categorised.csv with all original columns plus the new ones

Example prompt for cleaning address data

A sample prompt for standardising messy address data.

Read contacts.csv and clean the address data:

1. The 'address' column contains inconsistently formatted UK addresses
2. For each row, create new columns:
   - address_line_1: first line (number and street)
   - address_line_2: second line if present (flat number, building name)
   - city: the city or town
   - postcode: the postcode, standardised to uppercase with correct spacing
   - address_valid: true if the address looks complete, false if obviously incomplete

3. Handle common issues:
   - Postcodes sometimes missing space (SW1A1AA should become SW1A 1AA)
   - City names sometimes abbreviated (Manc -> Manchester)
   - Some addresses have everything on one line separated by commas

4. Process first 30 rows as sample, then full dataset
5. Save to contacts_cleaned.csv

Tools

Claude Codeservice · paid
Visit →
Anthropic APIservice · paid
Visit →

Resources

At a glance

Time to implement
hours
Setup cost
free
Ongoing cost
low
Cost trend
decreasing
Organisation size
small, medium, large
Target audience
data-analyst, operations-manager, program-delivery

API costs for processing ~1000 rows typically £0.50-£2 depending on complexity. Claude for Nonprofits 75% discount applies. The script itself is reusable - you only pay to run it.