← Back to recipes

Predict which volunteers might leave

service-deliveryintermediateproven

The problem

Volunteers drop out and you're blindsided. By the time they stop showing up, it's too late to intervene. You can't check in with everyone regularly - you've got 50+ volunteers. Some show warning signs (declining attendance, less engaged) but you don't spot them until they're gone. You need early warning of retention risks.

The solution

Build a prediction model that scores volunteers by likelihood to leave. It learns patterns from past departures: declining shift attendance, longer gaps between volunteering, reduced engagement, changes in role satisfaction. Current volunteers get a risk score (0-100% likely to leave in next 3 months). You prioritize check-ins with high-risk volunteers before they disengage completely.

What you get

A ranked list of volunteers by retention risk: 'Sarah: 85% likely to leave - attendance dropped 40% in last 2 months', 'John: 65% likely to leave - hasn't volunteered in 6 weeks'. For each at-risk volunteer you see: what signals triggered the score (declining attendance, gaps, role changes). You can act proactively rather than reactively.

Before you start

  • Volunteer data: attendance records, start dates, roles, engagement metrics
  • History of which volunteers left (at least 20-30 who departed) - if you don't have this data yet, start logging departures for 3-6 months before attempting to build a model
  • At least 6-12 months of historical data - ensure training data is relatively recent (within last 2 years) for the model to reflect current patterns
  • Lawful basis under GDPR to process volunteer data for retention purposes (legitimate interest or explicit consent)
  • A Google account for Colab
  • Basic Python skills or willingness to adapt example code

When to use this

  • You manage 30+ volunteers and can't check in with everyone regularly
  • Volunteers leave and you're not seeing it coming
  • You want to intervene early with at-risk volunteers
  • You've got historical data on who left and when

When not to use this

  • You have fewer than 30 volunteers - check in with everyone personally
  • You don't have data on past departures - model needs training data
  • Volunteer turnover is fine and retention isn't a priority
  • Your data quality is too poor (no attendance records, missing dates)

Steps

  1. 1

    Gather volunteer history

    Export data on: volunteer ID, start date, all shift dates, role(s), any engagement data (events attended, training completed), end date if they left. You need both current volunteers (active) and past volunteers who left (the training data for 'what leaving looks like').

  2. 2

    Calculate behavioral features

    Transform raw data into features the model can learn from: average shifts per month, trend (increasing/decreasing attendance), days since last shift, longest gap between shifts, total time as volunteer, number of role changes. These patterns predict departure risk.

  3. 3

    Label your training data

    For past volunteers, mark whether they left within 3 months: 'yes' if they departed, 'no' if they stayed active. The model learns: what patterns existed 3 months before departure? Did attendance decline? Were there long gaps? This trains it to spot the same patterns in current volunteers.

  4. 4

    Train the prediction model

    Use Random Forest (the example code) to learn which features predict departure. It identifies patterns: 'volunteers with declining attendance + gaps over 4 weeks + less than 6 months tenure = 80% likelihood to leave'. The model scores how much each factor matters.

  5. 5

    Validate the model

    Test the model on volunteers it hasn't seen: does it correctly identify who left? If it says 'high risk' do those volunteers actually tend to leave? If accuracy is below 70%, you might need more data or different features. Check it makes intuitive sense.

  6. 6

    Score current volunteers

    Run all active volunteers through the model. Each gets a risk score (0-100% likely to leave). Sort by risk score. Who's flagged as high risk (70%+)? Do you recognise warning signs when you look at their recent activity?

  7. 7

    Understand what drives each score

    For high-risk volunteers, look at which factors contributed: Is it declining attendance? Long gap since last shift? Recent role change? This tells you what to address in your conversation. 'Sarah, I noticed you haven't been in for 6 weeks - everything okay?'

  8. 8

    Act on the predictions

    Reach out to high-risk volunteers proactively. Check what's changed, whether they're still enjoying it, if there are barriers. Important: use predictions to inform your outreach, not to make automated decisions. The model flags patterns for your attention - you still use human judgement about each individual situation. Critical: risk scores should never be shared with the volunteer or used as the sole basis for formal HR/volunteering actions - they are internal triage tools only. You're catching issues early when you can still help, not after they've mentally checked out. Prevention not firefighting.

  9. 9

    Retrain monthly(optional)

    Update the model monthly with fresh data. Which volunteers left? Which stuck around despite high scores (false alarms - what made them stay?)? The model improves as it learns from new patterns.

Example code

Predict volunteer churn using Random Forest

This builds a model to predict which volunteers are likely to leave. Adapt the feature calculations to your data.

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from datetime import datetime, timedelta

# Load volunteer data
# Expected: volunteer_id, shift_date, role, start_date, end_date (if left)
shifts = pd.read_csv('volunteer_shifts.csv')
shifts['shift_date'] = pd.to_datetime(shifts['shift_date'])

volunteers = pd.read_csv('volunteers.csv')
volunteers['start_date'] = pd.to_datetime(volunteers['start_date'])
volunteers['end_date'] = pd.to_datetime(volunteers['end_date'], errors='coerce')

print(f"Loaded {len(volunteers)} volunteers, {len(shifts)} shift records")

# Calculate features for each volunteer
def calculate_features(volunteer_id, as_of_date):
    """Calculate behavioral features as of a specific date"""

    vol_shifts = shifts[
        (shifts['volunteer_id'] == volunteer_id) &
        (shifts['shift_date'] <= as_of_date)
    ].sort_values('shift_date')

    if len(vol_shifts) == 0:
        return None

    # Basic stats
    features = {
        'total_shifts': len(vol_shifts),
        'days_since_start': (as_of_date - vol_shifts['shift_date'].min()).days,
        'days_since_last_shift': (as_of_date - vol_shifts['shift_date'].max()).days,
    }

    # Attendance trend (last 3 months vs previous 3 months)
    recent_cutoff = as_of_date - timedelta(days=90)
    previous_cutoff = as_of_date - timedelta(days=180)

    recent_shifts = len(vol_shifts[vol_shifts['shift_date'] >= recent_cutoff])
    previous_shifts = len(vol_shifts[
        (vol_shifts['shift_date'] >= previous_cutoff) &
        (vol_shifts['shift_date'] < recent_cutoff)
    ])

    features['recent_shifts_3mo'] = recent_shifts
    features['previous_shifts_3mo'] = previous_shifts

    # Calculate trend (positive = increasing, negative = declining)
    if previous_shifts > 0:
        features['attendance_trend'] = (recent_shifts - previous_shifts) / previous_shifts
    else:
        features['attendance_trend'] = 0

    # Gap analysis
    if len(vol_shifts) > 1:
        gaps = vol_shifts['shift_date'].diff().dt.days.dropna()
        features['avg_gap_days'] = gaps.mean()
        features['max_gap_days'] = gaps.max()
    else:
        features['avg_gap_days'] = 0
        features['max_gap_days'] = 0

    return features

# Build training dataset from historical data
# For volunteers who left: predict if they would leave 3 months before actual departure
# For volunteers who stayed: sample random points to check

training_data = []

for _, vol in volunteers.iterrows():
    vol_id = vol['volunteer_id']

    if pd.notna(vol['end_date']):  # Volunteer left
        # Sample 3 months before they left
        prediction_date = vol['end_date'] - timedelta(days=90)

        if prediction_date > vol['start_date']:
            features = calculate_features(vol_id, prediction_date)
            if features:
                features['volunteer_id'] = vol_id
                features['will_leave'] = 1  # Left within 3 months
                training_data.append(features)

    else:  # Still active - sample from their history
        # Take multiple snapshots during their tenure
        start = vol['start_date']
        today = pd.Timestamp.now()

        # Sample every 3 months they've been active
        sample_date = start + timedelta(days=90)
        while sample_date < today:
            features = calculate_features(vol_id, sample_date)
            if features:
                features['volunteer_id'] = vol_id
                features['will_leave'] = 0  # Still active
                training_data.append(features)

            sample_date += timedelta(days=90)

# Create training dataframe
train_df = pd.DataFrame(training_data)
print(f"\nTraining samples: {len(train_df)}")
print(f"  Left: {len(train_df[train_df['will_leave'] == 1])}")
print(f"  Stayed: {len(train_df[train_df['will_leave'] == 0])}")

# Prepare features and target
feature_cols = [
    'total_shifts', 'days_since_start', 'days_since_last_shift',
    'recent_shifts_3mo', 'previous_shifts_3mo', 'attendance_trend',
    'avg_gap_days', 'max_gap_days'
]

X = train_df[feature_cols]
y = train_df['will_leave']

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Random Forest
rf = RandomForestClassifier(n_estimators=100, random_state=42, max_depth=5)
rf.fit(X_train, y_train)

# Evaluate
y_pred = rf.predict(X_test)
print("\nModel Performance:")
print(classification_report(y_test, y_pred, target_names=['Will Stay', 'Will Leave']))

# Feature importance
feature_importance = pd.DataFrame({
    'feature': feature_cols,
    'importance': rf.feature_importances_
}).sort_values('importance', ascending=False)

print("\nMost important factors predicting departure:")
print(feature_importance.head())

# Score current volunteers
print("\nScoring current active volunteers...")
current_volunteers = volunteers[volunteers['end_date'].isna()]

risk_scores = []
today = pd.Timestamp.now()

for _, vol in current_volunteers.iterrows():
    features_dict = calculate_features(vol['volunteer_id'], today)

    if features_dict:
        features_array = [features_dict[col] for col in feature_cols]
        risk_prob = rf.predict_proba([features_array])[0][1]  # Probability of leaving

        risk_scores.append({
            'volunteer_id': vol['volunteer_id'],
            'risk_score': round(risk_prob * 100, 1),
            'days_since_last': features_dict['days_since_last_shift'],
            'attendance_trend': round(features_dict['attendance_trend'], 2),
            'recent_shifts': features_dict['recent_shifts_3mo']
        })

# Sort by risk
risk_df = pd.DataFrame(risk_scores).sort_values('risk_score', ascending=False)

print(f"\nHigh risk volunteers (70%+ likely to leave):")
high_risk = risk_df[risk_df['risk_score'] >= 70]
print(high_risk.to_string(index=False))

print(f"\nMedium risk (50-70%):")
medium_risk = risk_df[(risk_df['risk_score'] >= 50) & (risk_df['risk_score'] < 70)]
print(medium_risk.head(10).to_string(index=False))

# Export for action
risk_df.to_csv('volunteer_retention_risk.csv', index=False)
print("\nFull risk scores saved to volunteer_retention_risk.csv")
print("\nNext steps:")
print("1. Reach out to high-risk volunteers")
print("2. Investigate what's changed for them")
print("3. Address barriers to continued engagement")
print("4. Retrain model monthly with new data")

Tools

Google Colabplatform · freemium
Visit →
scikit-learnlibrary · free · open source
Visit →
pandaslibrary · free · open source
Visit →

Resources

At a glance

Time to implement
weeks
Setup cost
free
Ongoing cost
free
Cost trend
stable
Organisation size
medium, large
Target audience
operations-manager, volunteer-coordinator, data-analyst

All tools are free. All processing runs locally on your computer - no data shared externally. Consider data protection: this involves automated profiling of volunteers. Ensure volunteers are informed about how their data is used. Initial setup takes time (gathering data, building model). Once built, scoring volunteers takes minutes. Re-train monthly with fresh data.