Spot patterns you weren't looking for

data-analysisintermediateproven

The problem

Your trustees want to know what's interesting in your data, but you don't have a data analyst. You've got spreadsheets full of service usage, outcomes, demographics - but you don't know what questions to ask. Manually checking every possible correlation would take forever.

The solution

Use automated profiling tools to scan your data for patterns, outliers, and correlations. Then use an LLM to translate the statistical findings into plain English. The tools find things like 'age groups with unusually high outcomes', 'service usage that spikes at specific times', or 'strong correlations you didn't expect'. You get a readable summary for your board.

What you get

A plain English report highlighting: unexpected patterns, significant correlations, outliers worth investigating, trends over time, and differences between groups. Each finding includes: what the pattern is, how strong it is, and why it might matter. Perfect for trustee papers or strategic planning.

Before you start

Your data as a CSV or Excel file
A Google account for Colab
A Claude or ChatGPT account to summarise findings

When to use this

You're preparing for a board meeting and want to highlight interesting findings
You suspect there are patterns in your data but don't know where to look
You want to spot problems (outliers, anomalies) before they become issues
Trustees or funders keep asking 'what does the data tell us?' and you're not sure

When not to use this

You already know exactly what you're looking for - just query it directly
Your dataset is tiny (under 50 rows) - you can just eyeball it
The data is too messy - clean it up first or use the 'check data for problems' recipe

Steps

1
Export your data
Get your data into CSV format. This could be service usage, outcomes, survey responses, donor data - anything you want to understand better. Make sure column names are clear (not 'col_A'). Remove any personally identifiable information first.
2
Upload to Colab and run profiling
Open Google Colab and use the example code to generate a profile report. This takes seconds and creates an HTML report showing: distributions, missing values, correlations, outliers, and patterns. Save the report to review.
3
Review the automated findings
Open the HTML report and look through it. The profiling tool highlights: variables with high correlation (things that move together), potential duplicates, unusual values, missing data patterns, and statistical warnings. Note anything that surprises you or seems important.
4
Extract key statistics for summary
Note down the interesting findings: correlation coefficients, distribution shapes, outliers identified. You don't need to understand all the statistics - just copy the numbers and descriptions. These will go into the LLM for interpretation.
5
Get plain English interpretation
Paste the key findings into Claude or ChatGPT with context about what the data represents. Ask: 'Explain these statistical findings in plain English for charity trustees. What's significant and why might it matter?' The AI translates stats into actionable insights. Note: Check your charity's data policy permits sharing even anonymised summaries with third-party AI tools - some patterns in aggregate data could still be sensitive.
6
Validate surprising findings
Any correlation or pattern that seems too strong or unexpected - go back to the raw data and check it's real, not a data quality issue. Sort by the relevant columns, spot-check some records. Make sure you're seeing a real pattern, not a data error.
7
Create your summary for stakeholders(optional)
Compile the 3-5 most interesting findings into a brief for your trustees or funders. For each: what the pattern is, how confident you are it's real, and what it might mean for your work. Include simple charts from the profile report where helpful.

Example code

Generate automated data profile

This creates a comprehensive profile report of your data, highlighting patterns and anomalies automatically.

# Install profiling library
!pip install ydata-profiling

import pandas as pd
from ydata_profiling import ProfileReport

# Load your data
df = pd.read_csv('your_data.csv')

print(f"Loaded {len(df)} rows and {len(df.columns)} columns")
print(f"Columns: {', '.join(df.columns)}")

# Generate profile report
# This automatically finds patterns, correlations, outliers
profile = ProfileReport(
    df,
    title="Data Profile Report",
    explorative=True,  # Look for patterns beyond basics
    correlations={
        "pearson": {"calculate": True},
        "spearman": {"calculate": True},
    }
)

# Save report
profile.to_file("data_profile.html")

print("\nProfile report generated: data_profile.html")
print("\nKey things to look for in the report:")
print("- Correlations tab: variables that move together")
print("- Alerts: data quality issues automatically detected")
print("- Missing values: patterns in what's not recorded")
print("- Interactions: relationships between variables")

# Quick correlation summary to paste into LLM
print("\n--- Copy this correlation summary to Claude/ChatGPT ---\n")

correlations = df.corr(numeric_only=True)
# Show only strong correlations (> 0.5 or < -0.5)
strong_correlations = []
for col1 in correlations.columns:
    for col2 in correlations.columns:
        if col1 < col2:  # Avoid duplicates
            corr = correlations.loc[col1, col2]
            if abs(corr) > 0.5:
                strong_correlations.append({
                    'var1': col1,
                    'var2': col2,
                    'correlation': f"{corr:.2f}"
                })

if strong_correlations:
    print("Strong correlations found:")
    for item in strong_correlations:
        print(f"- {item['var1']} and {item['var2']}: {item['correlation']}")
else:
    print("No particularly strong correlations detected.")

print("\nPrompt for LLM:")
print(f"I have a dataset about [describe your charity's work]. It contains {len(df)} records with these columns: {', '.join(df.columns)}. The automated analysis found these patterns: [paste correlations and key findings from the HTML report]. Please explain what's significant and why it might matter for our work.")