← Back to recipes

Extract outcomes from narrative reports

impact-measurementintermediateemerging

The problem

You've got 50 project reports in Word documents, each describing impact in narrative form ('participants showed increased confidence', 'families accessed better housing'). You need to aggregate this for your annual report or funder, but extracting and counting outcomes manually would take weeks. The data exists but it's trapped in prose.

The solution

Use an LLM to read your narrative reports and extract structured outcome data. Tell it what outcomes you track (wellbeing improved, employment gained, skills learned), and it pulls out: what outcomes were achieved, for how many people, and the evidence cited. What was scattered across 50 documents becomes a dataset you can count and analyze.

What you get

A structured spreadsheet with columns: report name, outcome type, number of beneficiaries, evidence/method, confidence score. You can now answer questions like 'How many people improved wellbeing across all our projects?' or 'Which projects achieved employment outcomes?' The narrative becomes queryable data.

Before you start

  • Project reports or evaluations in digital format (PDF, Word, or text) - the example script processes .txt files, so convert Word/PDF to plain text first
  • A defined outcomes framework - what outcomes you're looking for
  • An OpenAI or Anthropic API key for batch processing
  • Basic Python skills or willingness to adapt example code
  • IMPORTANT: Anonymise or de-identify reports before processing - narrative reports often contain beneficiary PII or case studies. Check your data processing agreements with AI providers cover this use case

When to use this

  • You have many narrative reports and need aggregate outcome data
  • You're writing annual reports or impact summaries
  • Funders want outcome numbers but you've only got stories
  • Manual extraction would take longer than you have

When not to use this

  • You only have a few reports - quicker to extract manually
  • Reports don't mention outcomes clearly - AI can't extract what isn't there
  • You need precise numbers for statutory reporting - validate carefully
  • Your outcomes framework is unclear or inconsistent across projects

Steps

  1. 1

    Define your outcomes framework

    List the outcomes you track across projects: 'increased wellbeing', 'gained employment', 'improved housing situation', 'reduced isolation', 'developed skills'. Be specific and use consistent language. This is what you'll ask the AI to extract.

  2. 2

    Test extraction on sample reports

    Take 3-5 reports and manually identify what outcomes they mention. Then ask Claude or ChatGPT to extract outcomes from the same reports. Compare: did it find what you found? Did it miss anything? Did it hallucinate outcomes not present?

  3. 3

    Refine your extraction prompt

    Based on the test, improve your prompt. Emphasize: only extract outcomes explicitly stated (don't infer), include the evidence/method if mentioned (survey, interview, observation), note confidence (was it 'all participants' or 'some participants'?), flag vague claims.

  4. 4

    Convert reports to text

    Get your reports into text format the AI can read. Word docs and plain PDFs work fine. Scanned PDFs need OCR first. Keep original formatting reasonable - tables and bullet points help the AI understand structure.

  5. 5

    Run batch extraction

    Use the API and example code to process all reports. The script reads each report, extracts structured outcome data, and builds a CSV. For 50 reports this might take 30-60 minutes. Monitor a few to check quality stays consistent.

  6. 6

    Validate the results

    Spot-check extractions against original reports. Did the AI accurately capture what was claimed? Are numbers correct? Is confidence scoring sensible? Check at least 20% of your reports. Look for patterns in errors - maybe the AI struggles with one outcome type.

  7. 7

    Clean and aggregate

    You'll have some inconsistencies - 'employment' vs 'gained work', 'wellbeing' vs 'mental health'. Standardise these in your spreadsheet. Then you can aggregate: total beneficiaries per outcome type, which projects achieved which outcomes, evidence methods used.

  8. 8

    Use for impact reporting(optional)

    Now you can answer questions like: 'Across all projects, 450 people reported improved wellbeing (measured by survey), 78 gained employment, 120 improved housing situation.' Back this up with the narrative stories from your original reports for compelling impact reporting.

Example code

Extract outcomes from narrative reports

This processes narrative reports and extracts structured outcome data. Adapt the outcomes framework to match what you track.

from openai import OpenAI
import pandas as pd
import json
from pathlib import Path

client = OpenAI()

# Your outcomes framework - adapt this to your organisation
OUTCOMES_FRAMEWORK = {
    "wellbeing": "Improved wellbeing, mental health, or life satisfaction",
    "employment": "Gained employment, job, or paid work",
    "skills": "Developed new skills, qualifications, or capabilities",
    "housing": "Improved housing situation or accommodation",
    "social_connections": "Reduced isolation or increased social connections",
    "confidence": "Increased confidence, self-esteem, or agency",
    "health": "Improved physical health or access to healthcare"
}

def extract_outcomes_from_report(report_text, report_name):
    """Extract structured outcome data from a narrative report"""

    prompt = f"""You are analyzing a project report to extract outcome data.

OUTCOMES FRAMEWORK (only extract these):
{json.dumps(OUTCOMES_FRAMEWORK, indent=2)}

REPORT TEXT:
{report_text}

Extract all outcomes mentioned that match the framework above. For each outcome found, return:
- outcome_type: which outcome from the framework (use the key name)
- number_of_people: how many people achieved this outcome (extract the number if stated, or "not specified")
- evidence_method: how was this measured? (e.g., "survey", "interviews", "observation", "self-reported")
- quote_from_report: the exact text where this outcome was mentioned
- confidence: your confidence this outcome was achieved (high/medium/low based on evidence quality)

IMPORTANT RULES:
- Only extract outcomes explicitly stated in the report
- Don't infer outcomes not mentioned
- If a number isn't given, note "not specified" not a guess
- Distinguish between "all participants" vs "some" vs a specific number
- If evidence method isn't mentioned, note "not stated"

Return a JSON object with an "outcomes" key containing an array of outcomes found. If no outcomes match the framework, return {"outcomes": []}.

Example format:
{{
  "outcomes": [
    {{
      "outcome_type": "wellbeing",
      "number_of_people": "45",
      "evidence_method": "pre/post survey using WEMWBS",
      "quote_from_report": "45 participants (78%) showed improved wellbeing scores",
      "confidence": "high"
    }}
  ]
}}"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )

    result = json.loads(response.choices[0].message.content)

    # Add report name to each outcome
    if 'outcomes' in result:
        for outcome in result['outcomes']:
            outcome['report_name'] = report_name

        return result['outcomes']
    else:
        return []

# Process all reports in a folder
reports_folder = Path("project_reports")
all_outcomes = []

print("Extracting outcomes from reports...")

for report_file in reports_folder.glob("*.txt"):
    print(f"\nProcessing: {report_file.name}")

    with open(report_file, 'r', encoding='utf-8') as f:
        report_text = f.read()

    try:
        outcomes = extract_outcomes_from_report(report_text, report_file.stem)

        if outcomes:
            print(f"  Found {len(outcomes)} outcomes")
            all_outcomes.extend(outcomes)
        else:
            print(f"  No outcomes matching framework found")

    except Exception as e:
        print(f"  Error processing: {e}")

# Save to CSV
if all_outcomes:
    df = pd.DataFrame(all_outcomes)
    df.to_csv('extracted_outcomes.csv', index=False)

    print(f"\n{'='*60}")
    print(f"Extraction complete! Found {len(all_outcomes)} outcomes across {len(df['report_name'].unique())} reports")
    print(f"\nOutcome breakdown:")
    print(df['outcome_type'].value_counts())

    print(f"\nConfidence distribution:")
    print(df['confidence'].value_counts())

    print(f"\nSaved to extracted_outcomes.csv")
    print(f"\nNext steps:")
    print("1. Spot-check extractions against original reports for accuracy")
    print("2. Review low-confidence outcomes")
    print("3. Standardize any naming variations")
    print("4. Aggregate numbers (accounting for 'not specified' entries)")
    print("5. Cross-reference with project records for validation")

else:
    print("\nNo outcomes extracted. Check:")
    print("- Are your reports mentioning outcomes from the framework?")
    print("- Is the outcomes framework specific enough?")
    print("- Are report files in the correct folder and format?")

Tools

Claudeservice · freemium
Visit →
OpenAI APIservice · paid
Visit →
Google Colabplatform · freemium
Visit →

Resources

At a glance

Time to implement
days
Setup cost
low
Ongoing cost
low
Cost trend
decreasing
Organisation size
medium, large
Target audience
data-analyst, ceo-trustees, program-delivery

API costs are ~£0.01-0.05 per report depending on length. For 50 reports that's £0.50-2.50. Note: Use paid API tiers for processing reports containing sensitive beneficiary data - free tiers will have the option to train on your data. Main cost is defining your outcomes framework and validating results.