Discover donor segments automatically

fundraisingintermediateproven

The problem

You segment donors by demographics (age, location) but suspect there are better ways to group them. Some 'major donors' give sporadically. Some 'regular givers' are actually declining. You're designing campaigns for segments that don't reflect how people actually behave.

The solution

Use clustering algorithms to let your donor data reveal its own natural groupings. Instead of deciding segments upfront, the algorithm finds patterns in giving behavior: frequency, amounts, timing, campaign responses. You discover segments you didn't know existed, based on what donors actually do rather than who you think they are.

What you get

Data-driven donor segments with clear characteristics: 'loyal regulars', 'event-triggered givers', 'declining supporters', 'growing enthusiasts'. For each segment you get: typical behaviors, size, and actionable insights for targeting. You can assign every donor to their segment for campaign planning.

Before you start

Donor database with at least 1-2 years of giving history
Data on: donation amounts, dates, campaigns, and ideally engagement (opens, clicks)
At least 200-300 donors (clustering needs volume to find patterns)
A Google account for Colab
Basic comfort with Python or willingness to adapt example code
DATA PROTECTION: Anonymise donor data before uploading to cloud platforms. Replace names and email addresses with anonymous IDs. Check your charity's privacy policy permits this type of analysis. The algorithm only needs behavioral data, not identifying information.

When to use this

Your current segments don't predict behavior well
You want to discover patterns rather than assume them
You've got enough donors and data history for patterns to emerge
You're planning targeted campaigns and want better targeting

When not to use this

You have fewer than 200 donors - too small for patterns
Your donors have no variation in behavior (everyone gives once a year)
You don't have the data on behavior, only basic demographics
Your CRM already does sophisticated behavioral segmentation

Steps

1
Export your donor giving data
Get data on: donor ID, all donation dates and amounts, which campaigns they responded to, engagement metrics if you have them (email opens, event attendance). You want behavior data, not just demographics. Export as CSV.
2
Calculate behavioral features
Transform the raw data into features the algorithm can use: total given, average gift size, number of gifts, months since first/last gift, giving trend (up/down/stable), engagement level. UK charities: consider including Gift Aid status as a feature - it's a strong behavioral/demographic indicator. The example code shows how to calculate these from your donation history.
3
Visualize the data first
Plot your donors on a scatter chart (average gift vs frequency, or recency vs total given). Do you see natural clusters forming? This gives you intuition before the algorithm runs. IMPORTANT: K-means is sensitive to outliers. If you have 'mega-donors' who give vastly more than others (10x+ the average), consider removing them from the main analysis or analysing them separately - otherwise they can pull entire clusters toward themselves.
4
Run clustering to find segments
Use K-means clustering to group donors by behavioral similarity. Start with 4-6 segments - you can adjust later. The algorithm assigns each donor to the segment they're most similar to. It's fast and interpretable.
5
Analyse each segment
For each cluster, calculate the average characteristics: how much do they give, how often, how engaged are they, are they growing or declining? Look for the story each segment tells. Give them descriptive names based on their behaviour.
6
Validate the segments make sense
Do the segments feel meaningful? Pick a few donors from each segment and check their histories. Do segment members have similar patterns? If segments feel random or arbitrary, you might need different features or a different number of clusters.
7
Export segment assignments
Add the segment ID to your donor database so you can filter and target. Now you can create campaigns for 'declining supporters who used to give regularly' rather than 'everyone over 50'.
8
Refresh periodically(optional)
Donors move between segments as their behavior changes. Re-run the analysis quarterly or annually to catch shifts. Someone who was 'growing enthusiast' might have become 'loyal regular'.

Example code

Discover donor segments with K-means clustering

This calculates behavioral features from donation history and finds natural donor groupings. Adapt the feature calculations to your data. Note: The trend calculation loops through each donor individually - for datasets over 5,000 donors, you may want to rewrite it using pandas groupby for better performance.

import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from datetime import datetime

# Load donation history
donations = pd.read_csv('donations.csv')
donations['date'] = pd.to_datetime(donations['date'])

# Calculate behavioral features for each donor
features = donations.groupby('donor_id').agg({
    'amount': ['sum', 'mean', 'count'],  # Total, average, frequency
    'date': ['min', 'max']                # First and last gift
}).reset_index()

features.columns = ['donor_id', 'total_given', 'avg_gift', 'num_gifts', 'first_gift', 'last_gift']

# Calculate recency (months since last gift)
# Use a fixed reference date for reproducibility (change to your analysis date)
reference_date = pd.Timestamp('2024-12-31')  # Or use pd.Timestamp.now() if you want current date
features['months_since_last'] = (reference_date - features['last_gift']).dt.days / 30
features['months_active'] = (features['last_gift'] - features['first_gift']).dt.days / 30

# Calculate trend (are they giving more or less over time?)
# Simple version: compare first half vs second half of their giving
def calculate_trend(donor_id):
    donor_gifts = donations[donations['donor_id'] == donor_id].sort_values('date')
    if len(donor_gifts) < 4:
        return 0
    midpoint = len(donor_gifts) // 2
    first_half_avg = donor_gifts.iloc[:midpoint]['amount'].mean()
    second_half_avg = donor_gifts.iloc[midpoint:]['amount'].mean()
    return (second_half_avg - first_half_avg) / first_half_avg

features['trend'] = features['donor_id'].apply(calculate_trend)

# Select features for clustering
cluster_features = features[['total_given', 'avg_gift', 'num_gifts', 'months_since_last', 'trend']]

# Standardize features (important for clustering)
scaler = StandardScaler()
scaled_features = scaler.fit_transform(cluster_features)

# Find optimal number of clusters (optional - try 4-6 to start)
# Elbow method: plot inertia vs number of clusters
inertias = []
for k in range(2, 10):
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(scaled_features)
    inertias.append(kmeans.inertia_)

plt.figure(figsize=(8, 4))
plt.plot(range(2, 10), inertias, marker='o')
plt.xlabel('Number of segments')
plt.ylabel('Inertia')
plt.title('Elbow plot: choosing number of segments')
plt.show()

# Run clustering with chosen number of segments
n_segments = 5  # Adjust based on elbow plot
kmeans = KMeans(n_clusters=n_segments, random_state=42)
features['segment'] = kmeans.fit_predict(scaled_features)

# Analyze each segment
segment_analysis = features.groupby('segment').agg({
    'donor_id': 'count',
    'total_given': ['mean', 'median'],
    'avg_gift': 'mean',
    'num_gifts': 'mean',
    'months_since_last': 'mean',
    'trend': 'mean'
}).round(2)

print("\nSegment characteristics:")
print(segment_analysis)

# Visualize segments
plt.figure(figsize=(10, 6))
scatter = plt.scatter(features['avg_gift'], features['num_gifts'],
                     c=features['segment'], cmap='viridis', alpha=0.6)
plt.xlabel('Average gift size (£)')
plt.ylabel('Number of gifts')
plt.title('Donor segments by behavior')
plt.colorbar(scatter, label='Segment')
plt.show()

# Export with segment assignments
features.to_csv('donors_with_segments.csv', index=False)

# Name your segments based on characteristics
segment_names = {
    0: 'Review and name based on characteristics',
    1: 'Review and name based on characteristics',
    2: 'Review and name based on characteristics',
    3: 'Review and name based on characteristics',
    4: 'Review and name based on characteristics'
}

print("\nNext step: Review the characteristics of each segment and give them meaningful names.")
print("For example: 'Loyal regulars', 'One-time large donors', 'Declining supporters', etc.")