Predict which volunteers might leave

service-deliveryintermediateproven

The problem

Volunteers drop out and you're blindsided. By the time they stop showing up, it's too late to intervene. You can't check in with everyone regularly - you've got 50+ volunteers. Some show warning signs (declining attendance, less engaged) but you don't spot them until they're gone. You need early warning of retention risks.

The solution

Build a prediction model that scores volunteers by likelihood to leave. It learns patterns from past departures: declining shift attendance, longer gaps between volunteering, reduced engagement, changes in role satisfaction. Current volunteers get a risk score (0-100% likely to leave in next 3 months). You prioritize check-ins with high-risk volunteers before they disengage completely.

What you get

A ranked list of volunteers by retention risk: 'Sarah: 85% likely to leave - attendance dropped 40% in last 2 months', 'John: 65% likely to leave - hasn't volunteered in 6 weeks'. For each at-risk volunteer you see: what signals triggered the score (declining attendance, gaps, role changes). You can act proactively rather than reactively.

Before you start

Volunteer data: attendance records, start dates, roles, engagement metrics
History of which volunteers left (at least 20-30 who departed) - if you don't have this data yet, start logging departures for 3-6 months before attempting to build a model
At least 6-12 months of historical data - ensure training data is relatively recent (within last 2 years) for the model to reflect current patterns
Lawful basis under GDPR to process volunteer data for retention purposes (legitimate interest or explicit consent)
A Google account for Colab
Basic Python skills or willingness to adapt example code

When to use this

You manage 30+ volunteers and can't check in with everyone regularly
Volunteers leave and you're not seeing it coming
You want to intervene early with at-risk volunteers
You've got historical data on who left and when

When not to use this

You have fewer than 30 volunteers - check in with everyone personally
You don't have data on past departures - model needs training data
Volunteer turnover is fine and retention isn't a priority
Your data quality is too poor (no attendance records, missing dates)

Steps

1
Gather volunteer history
Export data on: volunteer ID, start date, all shift dates, role(s), any engagement data (events attended, training completed), end date if they left. You need both current volunteers (active) and past volunteers who left (the training data for 'what leaving looks like').
2
Calculate behavioral features
Transform raw data into features the model can learn from: average shifts per month, trend (increasing/decreasing attendance), days since last shift, longest gap between shifts, total time as volunteer, number of role changes. These patterns predict departure risk.
3
Label your training data
For past volunteers, mark whether they left within 3 months: 'yes' if they departed, 'no' if they stayed active. The model learns: what patterns existed 3 months before departure? Did attendance decline? Were there long gaps? This trains it to spot the same patterns in current volunteers.
4
Train the prediction model
Use Random Forest (the example code) to learn which features predict departure. It identifies patterns: 'volunteers with declining attendance + gaps over 4 weeks + less than 6 months tenure = 80% likelihood to leave'. The model scores how much each factor matters.
5
Validate the model
Test the model on volunteers it hasn't seen: does it correctly identify who left? If it says 'high risk' do those volunteers actually tend to leave? If accuracy is below 70%, you might need more data or different features. Check it makes intuitive sense.
6
Score current volunteers
Run all active volunteers through the model. Each gets a risk score (0-100% likely to leave). Sort by risk score. Who's flagged as high risk (70%+)? Do you recognise warning signs when you look at their recent activity?
7
Understand what drives each score
For high-risk volunteers, look at which factors contributed: Is it declining attendance? Long gap since last shift? Recent role change? This tells you what to address in your conversation. 'Sarah, I noticed you haven't been in for 6 weeks - everything okay?'
8
Act on the predictions
Reach out to high-risk volunteers proactively. Check what's changed, whether they're still enjoying it, if there are barriers. Important: use predictions to inform your outreach, not to make automated decisions. The model flags patterns for your attention - you still use human judgement about each individual situation. Critical: risk scores should never be shared with the volunteer or used as the sole basis for formal HR/volunteering actions - they are internal triage tools only. You're catching issues early when you can still help, not after they've mentally checked out. Prevention not firefighting.
9
Retrain monthly(optional)
Update the model monthly with fresh data. Which volunteers left? Which stuck around despite high scores (false alarms - what made them stay?)? The model improves as it learns from new patterns.

Example code

Predict volunteer churn using Random Forest

This builds a model to predict which volunteers are likely to leave. Adapt the feature calculations to your data.

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from datetime import datetime, timedelta

# Load volunteer data
# Expected: volunteer_id, shift_date, role, start_date, end_date (if left)
shifts = pd.read_csv('volunteer_shifts.csv')
shifts['shift_date'] = pd.to_datetime(shifts['shift_date'])

volunteers = pd.read_csv('volunteers.csv')
volunteers['start_date'] = pd.to_datetime(volunteers['start_date'])
volunteers['end_date'] = pd.to_datetime(volunteers['end_date'], errors='coerce')

print(f"Loaded {len(volunteers)} volunteers, {len(shifts)} shift records")

# Calculate features for each volunteer
def calculate_features(volunteer_id, as_of_date):
    """Calculate behavioral features as of a specific date"""

    vol_shifts = shifts[
        (shifts['volunteer_id'] == volunteer_id) &
        (shifts['shift_date'] <= as_of_date)
    ].sort_values('shift_date')

    if len(vol_shifts) == 0:
        return None

    # Basic stats
    features = {
        'total_shifts': len(vol_shifts),
        'days_since_start': (as_of_date - vol_shifts['shift_date'].min()).days,
        'days_since_last_shift': (as_of_date - vol_shifts['shift_date'].max()).days,
    }

    # Attendance trend (last 3 months vs previous 3 months)
    recent_cutoff = as_of_date - timedelta(days=90)
    previous_cutoff = as_of_date - timedelta(days=180)

    recent_shifts = len(vol_shifts[vol_shifts['shift_date'] >= recent_cutoff])
    previous_shifts = len(vol_shifts[
        (vol_shifts['shift_date'] >= previous_cutoff) &
        (vol_shifts['shift_date'] < recent_cutoff)
    ])

    features['recent_shifts_3mo'] = recent_shifts
    features['previous_shifts_3mo'] = previous_shifts

    # Calculate trend (positive = increasing, negative = declining)
    if previous_shifts > 0:
        features['attendance_trend'] = (recent_shifts - previous_shifts) / previous_shifts
    else:
        features['attendance_trend'] = 0

    # Gap analysis
    if len(vol_shifts) > 1:
        gaps = vol_shifts['shift_date'].diff().dt.days.dropna()
        features['avg_gap_days'] = gaps.mean()
        features['max_gap_days'] = gaps.max()
    else:
        features['avg_gap_days'] = 0
        features['max_gap_days'] = 0

    return features

# Build training dataset from historical data
# For volunteers who left: predict if they would leave 3 months before actual departure
# For volunteers who stayed: sample random points to check

training_data = []

for _, vol in volunteers.iterrows():
    vol_id = vol['volunteer_id']

    if pd.notna(vol['end_date']):  # Volunteer left
        # Sample 3 months before they left
        prediction_date = vol['end_date'] - timedelta(days=90)

        if prediction_date > vol['start_date']:
            features = calculate_features(vol_id, prediction_date)
            if features:
                features['volunteer_id'] = vol_id
                features['will_leave'] = 1  # Left within 3 months
                training_data.append(features)

    else:  # Still active - sample from their history
        # Take multiple snapshots during their tenure
        start = vol['start_date']
        today = pd.Timestamp.now()

        # Sample every 3 months they've been active
        sample_date = start + timedelta(days=90)
        while sample_date < today:
            features = calculate_features(vol_id, sample_date)
            if features:
                features['volunteer_id'] = vol_id
                features['will_leave'] = 0  # Still active
                training_data.append(features)

            sample_date += timedelta(days=90)

# Create training dataframe
train_df = pd.DataFrame(training_data)
print(f"\nTraining samples: {len(train_df)}")
print(f"  Left: {len(train_df[train_df['will_leave'] == 1])}")
print(f"  Stayed: {len(train_df[train_df['will_leave'] == 0])}")

# Prepare features and target
feature_cols = [
    'total_shifts', 'days_since_start', 'days_since_last_shift',
    'recent_shifts_3mo', 'previous_shifts_3mo', 'attendance_trend',
    'avg_gap_days', 'max_gap_days'
]

X = train_df[feature_cols]
y = train_df['will_leave']

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Random Forest
rf = RandomForestClassifier(n_estimators=100, random_state=42, max_depth=5)
rf.fit(X_train, y_train)

# Evaluate
y_pred = rf.predict(X_test)
print("\nModel Performance:")
print(classification_report(y_test, y_pred, target_names=['Will Stay', 'Will Leave']))

# Feature importance
feature_importance = pd.DataFrame({
    'feature': feature_cols,
    'importance': rf.feature_importances_
}).sort_values('importance', ascending=False)

print("\nMost important factors predicting departure:")
print(feature_importance.head())

# Score current volunteers
print("\nScoring current active volunteers...")
current_volunteers = volunteers[volunteers['end_date'].isna()]

risk_scores = []
today = pd.Timestamp.now()

for _, vol in current_volunteers.iterrows():
    features_dict = calculate_features(vol['volunteer_id'], today)

    if features_dict:
        features_array = [features_dict[col] for col in feature_cols]
        risk_prob = rf.predict_proba([features_array])[0][1]  # Probability of leaving

        risk_scores.append({
            'volunteer_id': vol['volunteer_id'],
            'risk_score': round(risk_prob * 100, 1),
            'days_since_last': features_dict['days_since_last_shift'],
            'attendance_trend': round(features_dict['attendance_trend'], 2),
            'recent_shifts': features_dict['recent_shifts_3mo']
        })

# Sort by risk
risk_df = pd.DataFrame(risk_scores).sort_values('risk_score', ascending=False)

print(f"\nHigh risk volunteers (70%+ likely to leave):")
high_risk = risk_df[risk_df['risk_score'] >= 70]
print(high_risk.to_string(index=False))

print(f"\nMedium risk (50-70%):")
medium_risk = risk_df[(risk_df['risk_score'] >= 50) & (risk_df['risk_score'] < 70)]
print(medium_risk.head(10).to_string(index=False))

# Export for action
risk_df.to_csv('volunteer_retention_risk.csv', index=False)
print("\nFull risk scores saved to volunteer_retention_risk.csv")
print("\nNext steps:")
print("1. Reach out to high-risk volunteers")
print("2. Investigate what's changed for them")
print("3. Address barriers to continued engagement")
print("4. Retrain model monthly with new data")

Tools

Google Colabplatform · freemium

Visit →

scikit-learnlibrary · free · open source

Visit →

pandaslibrary · free · open source

Visit →

Resources

Volunteer retention best practicesdocumentation

NCVO guidance on keeping volunteers engaged.

Random Forest explainedtutorial

How Random Forest classifiers work.

At a glance

Time to implement: weeks
Setup cost: free
Ongoing cost: free
Cost trend: stable
Organisation size: medium, large
Target audience: operations-manager, volunteer-coordinator, data-analyst

All tools are free. All processing runs locally on your computer - no data shared externally. Consider data protection: this involves automated profiling of volunteers. Ensure volunteers are informed about how their data is used. Initial setup takes time (gathering data, building model). Once built, scoring volunteers takes minutes. Re-train monthly with fresh data.

The problem

The solution

What you get

Before you start

When to use this

When not to use this

Steps

Gather volunteer history

Calculate behavioral features

Label your training data

Train the prediction model

Validate the model

Score current volunteers

Understand what drives each score

Act on the predictions

Retrain monthly(optional)

Example code

Predict volunteer churn using Random Forest

Tools

Resources

At a glance