← Back to recipes

Forecast event attendance

fundraisingintermediateemerging

The problem

You're planning an event and guessing attendance. Over-estimate and you've wasted money on catering, venue space, printed materials. Under-estimate and you disappoint people (ran out of food, overcrowded room). You're using gut feel or 'last year's numbers' but attendance varies by: time of year, day of week, topic, weather, competing events. You need better predictions to plan resources efficiently.

The solution

Build a forecasting model using historical event data. It learns patterns: how registrations convert to attendance, seasonal effects (summer events get lower turnout), day-of-week impact (Saturday vs Tuesday), topic popularity, how far in advance people register. For new events it predicts: likely attendance range, optimal catering numbers, whether you need overflow space. Data-driven planning instead of guesswork.

What you get

Attendance forecast for upcoming event: 'Expected attendance: 75 people (range: 65-85 with 80% confidence). Based on: 95 registered, historical 79% show-up rate, Saturday event (+10% typical), summer month (-5% typical), similar topic averaged 72 attendees. Catering recommendation: 80 portions. Overflow planning: not needed unless registrations exceed 100.'

Before you start

  • Historical event data: registrations, actual attendance, date, day of week, topic/type
  • At least 15-20 past events to identify patterns
  • Understanding of factors that affect your attendance (topic, timing, format)
  • A Google account for Colab
  • Basic Python skills or willingness to adapt example code

When to use this

  • You run regular events and struggle to predict turnout accurately
  • You're wasting money on over-catering or disappointing people with under-capacity
  • Attendance varies significantly between events and you don't understand why
  • You've got historical event data to learn patterns from

When not to use this

  • You run very few events (under 10/year) - patterns unclear
  • You don't have historical attendance data - model needs training data
  • Every event is completely unique - no patterns to learn
  • Your events are always at capacity (waiting lists) - forecasting doesn't help, you need bigger venues

Steps

  1. 1

    Gather historical event data

    Export past event data: registrations count, actual attendance, date, day of week, topic/event type, format (in-person/online/hybrid), whether it was free or paid, any special factors (celebrity speaker, bad weather). Before uploading to Google Colab, strip out personal data (names, emails) - you only need anonymised counts and categories. You need both inputs (what was planned) and outcomes (who actually came). Minimum 15-20 events, though note that with only 15-20 events the Mean Absolute Error gives a rough guide rather than statistical certainty.

  2. 2

    Calculate key metrics

    For each past event, calculate: show-up rate (attendance/registrations), whether it was over/under capacity, seasonal timing (month, quarter), lead time (how far in advance people registered). These metrics help identify patterns: summer events might have 70% show-up vs 85% in autumn.

  3. 3

    Identify attendance factors

    What affects turnout for your events? Typical patterns: day of week (weekends vs weekdays), season (summer lower), topic (popular vs niche), price (free vs paid), format (online easier to skip), weather (for in-person). List 5-7 factors that matter in your context. These become model features.

  4. 4

    Build the forecasting model

    Use the example code (Random Forest regression) to learn patterns from historical data. The model identifies: 'Saturday events get 15% higher attendance', 'Summer months -10%', 'Popular topics +20%', 'Online events have 65% show-up vs 80% in-person'. It learns what combinations predict high/low turnout. Note: the model needs to be retrained if a completely new event type or topic is introduced that wasn't in the training data.

  5. 5

    Validate model accuracy

    Test on events the model hasn't seen: how close are predictions to actual attendance? If typically within 10-15 people, that's useful for planning. If predictions are off by 50%, you need more data or different features. Check: does it make intuitive sense? (Saturday events predicted higher than Tuesday - that tracks?).

  6. 6

    Forecast upcoming events

    For new events, input: registrations so far, planned date, day of week, topic, format. Model predicts: expected attendance with confidence range. If you've got 80 registrations for Saturday workshop, model might predict: 65 attendees (range 55-75). That's actionable for planning.

  7. 7

    Use forecasts for resource planning

    Turn predictions into decisions: catering numbers (forecast + 10% buffer), venue capacity needed, printed materials, staff allocation. If forecast is 65 with range 55-75, order catering for 70-75 (safe margin), book room for 80 (don't want cramped), print 70 handouts. Data-informed resource decisions.

  8. 8

    Track and improve(optional)

    After each event, record: forecast vs actual. Were you close? If consistently over/under-predicting, adjust. Feed actual results back into model (retrain monthly). Model improves as it learns from more events. Track forecast accuracy over time - it should get better.

Example code

Forecast event attendance

This predicts attendance based on historical patterns. Adapt features to factors that affect your events.

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, r2_score
from datetime import datetime

# Load historical event data
# Columns: event_name, date, registrations, actual_attendance, day_of_week, topic, format, etc.
events = pd.read_csv('historical_events.csv')
events['date'] = pd.to_datetime(events['date'])

print(f"Loaded {len(events)} historical events")

# Calculate show-up rate
events['show_up_rate'] = events['actual_attendance'] / events['registrations']
avg_show_up = events['show_up_rate'].mean()

print(f"Average show-up rate: {avg_show_up*100:.1f}%")
print(f"Show-up rate range: {events['show_up_rate'].min()*100:.1f}% to {events['show_up_rate'].max()*100:.1f}%")

# Extract time-based features
events['month'] = events['date'].dt.month
events['day_of_week_num'] = events['date'].dt.dayofweek  # 0=Monday, 6=Sunday
events['is_weekend'] = (events['day_of_week_num'] >= 5).astype(int)
events['quarter'] = events['date'].dt.quarter

# Encode categorical variables
# NOTE: Using category codes gives arbitrary numbers (0, 1, 2...) which the model
# treats as having mathematical order. For small numbers of categories (3-5) this
# usually works fine. For many categories or if you see odd results, consider
# using OneHotEncoder instead, which creates separate binary columns per category.
# Topic/event type
events['topic_encoded'] = pd.Categorical(events['topic']).codes

# Format (in-person, online, hybrid)
events['format_encoded'] = pd.Categorical(events['format']).codes

# Define features for prediction
feature_cols = [
    'registrations',  # How many signed up
    'month',  # Seasonal effects
    'is_weekend',  # Weekend vs weekday
    'topic_encoded',  # Event topic/type
    'format_encoded',  # In-person/online/hybrid
]

# Optional: add more features if you have them
# 'is_free', 'has_celebrity_speaker', 'days_notice', etc.

X = events[feature_cols]
y = events['actual_attendance']

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Random Forest model
rf = RandomForestRegressor(n_estimators=100, random_state=42, max_depth=5)
rf.fit(X_train, y_train)

# Evaluate
y_pred = rf.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"\nModel Performance:")
print(f"  Mean Absolute Error: {mae:.1f} people")
print(f"  R² Score: {r2:.2f}")
print(f"  (On average, predictions are within {mae:.0f} people of actual attendance)")

# Feature importance
feature_importance = pd.DataFrame({
    'feature': feature_cols,
    'importance': rf.feature_importances_
}).sort_values('importance', ascending=False)

print("\nMost important factors for attendance:")
print(feature_importance)

# Function to forecast new event
def forecast_attendance(registrations, month, is_weekend, topic, format_type):
    """
    Forecast attendance for upcoming event
    """
    # Encode categorical values (must match training)
    topic_code = pd.Categorical([topic], categories=events['topic'].unique()).codes[0]
    format_code = pd.Categorical([format_type], categories=events['format'].unique()).codes[0]

    # Create features array
    features = [[
        registrations,
        month,
        1 if is_weekend else 0,
        topic_code,
        format_code
    ]]

    # Predict
    predicted_attendance = rf.predict(features)[0]

    # Calculate confidence interval (based on model error)
    lower_bound = predicted_attendance - (mae * 1.5)
    upper_bound = predicted_attendance + (mae * 1.5)

    # Ensure bounds make sense
    lower_bound = max(0, lower_bound)
    upper_bound = min(registrations, upper_bound)  # Can't exceed registrations

    return {
        'predicted_attendance': round(predicted_attendance),
        'lower_bound': round(lower_bound),
        'upper_bound': round(upper_bound),
        'show_up_rate_predicted': predicted_attendance / registrations if registrations > 0 else 0
    }

# Example: Forecast for upcoming event
upcoming_event = {
    'registrations': 85,
    'month': 10,  # October
    'is_weekend': True,  # Saturday
    'topic': 'Fundraising Workshop',  # Must be a topic from training data
    'format': 'in-person'  # Must be a format from training data
}

forecast = forecast_attendance(
    upcoming_event['registrations'],
    upcoming_event['month'],
    upcoming_event['is_weekend'],
    upcoming_event['topic'],
    upcoming_event['format']
)

print(f"\n{'='*60}")
print("ATTENDANCE FORECAST FOR UPCOMING EVENT:")
print(f"{'='*60}")
print(f"\nEvent Details:")
print(f"  Registrations: {upcoming_event['registrations']}")
print(f"  Date: October, Weekend")
print(f"  Topic: {upcoming_event['topic']}")
print(f"  Format: {upcoming_event['format']}")

print(f"\nForecast:")
print(f"  Expected attendance: {forecast['predicted_attendance']} people")
print(f"  Range (80% confidence): {forecast['lower_bound']}-{forecast['upper_bound']} people")
print(f"  Predicted show-up rate: {forecast['show_up_rate_predicted']*100:.1f}%")

print(f"\nPlanning Recommendations:")
# Catering (predicted + buffer)
catering = forecast['predicted_attendance'] + 5
print(f"  Catering: Order for {catering} people (forecast + 5 buffer)")

# Venue capacity
venue_capacity = forecast['upper_bound'] + 10
print(f"  Venue: Ensure capacity for {venue_capacity}+ people")

# Materials
materials = forecast['predicted_attendance']
print(f"  Printed materials: {materials} copies")

# Overflow planning
if forecast['upper_bound'] > upcoming_event['registrations'] * 0.9:
    print(f"  ⚠️  High attendance expected - consider overflow plan")

print(f"\n{'='*60}")

# Save forecasting function
print("\nModel trained and ready for forecasting.")
print("Update with actual attendance after events to improve accuracy.")

Tools

Google Colabplatform · freemium
Visit →
scikit-learnlibrary · free · open source
Visit →
pandaslibrary · free · open source
Visit →

Resources

At a glance

Time to implement
days
Setup cost
free
Ongoing cost
free
Cost trend
stable
Organisation size
small, medium, large
Target audience
fundraising, operations-manager, data-analyst

All tools are free. All processing runs locally. Initial setup takes a day (gathering data, building model). Once built, forecasting new events takes minutes. Prevents waste: over-catering costs £200-500 per event, under-capacity disappoints supporters. Better forecasts pay for themselves quickly.