← Back to recipes

Detect unusual patterns across services

complianceintermediateproven

The problem

You run services across 30 locations. Each month you get reports showing numbers served, outcomes achieved, costs incurred. But you can't easily spot when one site is performing unusually - whether that's a problem to investigate or a success to learn from. Anomalies get buried in spreadsheets until they become crises.

The solution

Use anomaly detection to automatically flag sites or time periods that are statistical outliers. The algorithm learns what 'normal' looks like for your operations, then highlights anything unusual: a site with half the usual uptake, costs that spiked unexpectedly, outcomes that suddenly improved. You investigate the flagged cases rather than checking everything manually.

What you get

A list of anomalies ranked by how unusual they are: 'Site A had 40% lower service usage than expected in March', 'Site B achieved twice the typical outcome rate in Q2', 'Cost per person increased 50% at Site C'. Each anomaly includes: what's unusual, how unusual (statistical confidence), and which metric triggered it. You focus investigation on the outliers.

Before you start

  • Service data across multiple sites/periods with consistent metrics
  • At least 6-12 months of historical data to establish baseline
  • Metrics that should be relatively stable (usage, costs, outcomes)
  • A Google account for Colab
  • Basic comfort with Python or willingness to adapt example code
  • DATA PROTECTION: Anonymise all beneficiary data before uploading to cloud platforms. Use site/period IDs rather than identifiable details. Check your data protection policy permits this analysis. IMPORTANT: Never use anomaly detection outputs to directly assess staff performance without human context - algorithmic findings can be misleading without understanding local circumstances.

When to use this

  • You manage multiple sites/services and can't manually check everything
  • Problems sometimes go unnoticed until they're serious
  • You want to spot successes to replicate, not just problems
  • Your KPIs should be relatively stable (unusual changes mean something)

When not to use this

  • You only have one location - nothing to compare against
  • Your metrics vary wildly for known reasons (seasonal, funding cycles)
  • You've got fewer than 6 months of data - not enough for baseline
  • Your data quality is poor - anomalies might just be data errors

Steps

  1. 1

    Gather your operational data

    Export your key metrics for each site and time period: people served, outcomes achieved, costs, staff hours, whatever you track. Structure it with columns for: site, date/period, and each metric. You need this in regular intervals (monthly, quarterly) to detect patterns.

  2. 2

    Choose metrics that should be stable

    Focus on metrics where unusual changes matter: people served per staff hour, cost per person, outcome rate, no-show percentage. Don't include metrics you expect to vary wildly. Seasonal services need seasonal baselines (compare December to December, not December to June). IMPORTANT: Use rates and ratios (e.g., outcomes per person, cost per session) rather than absolute numbers for fair comparison - a large city site serving 500 people should not be compared directly to a rural site serving 50.

  3. 3

    Visualise your data first

    Plot your metrics over time for each site. Do you see sites that are consistently different (rural vs urban) or periods that look wrong (data errors)? Understanding the visual pattern helps you interpret what the algorithm finds. It also spots data quality issues before they confuse the analysis.

  4. 4

    Run anomaly detection

    Use Isolation Forest (the example code) to find outliers. It learns what combinations of metrics are typical, then flags anything unusual. A site might be normal on each metric individually but unusual in combination (low costs AND low outcomes together is suspicious).

  5. 5

    Review flagged anomalies

    Sort anomalies by how unusual they are (the score the algorithm gives). Check the top 10-20. Are they real (a site genuinely performing differently) or artifacts (data errors, one-off events)? Cross-reference with what you know about those sites/periods.

  6. 6

    Investigate root causes

    For anomalies that are real, dig deeper. Is a site underperforming because of staffing issues, local demographics, or something else? Is a site excelling because of a specific approach you could replicate? Anomalies are flags for investigation, not answers themselves.

  7. 7

    Set up monthly monitoring(optional)

    Re-run the analysis each month as new data comes in. Track which anomalies persist (chronic issues) vs one-offs (temporary blips). Over time you'll build intuition for what's worth investigating immediately vs watching.

Example code

Detect anomalies using Isolation Forest

This finds sites/periods with unusual patterns across your metrics. Adapt the metrics to what you track. Expected CSV format: columns for 'site', 'period', 'people_served', 'outcome_rate', 'cost_per_person', 'staff_hours'. Example row: 'London,2024-01,150,0.65,45.20,320'.

import pandas as pd
import numpy as np
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Load your operational data
# Expected columns: site, period, metric1, metric2, etc.
df = pd.read_csv('service_data.csv')

print(f"Loaded {len(df)} records across {df['site'].nunique()} sites")

# Select metrics to analyze
# Choose metrics where unusual values are meaningful
metrics = ['people_served', 'outcome_rate', 'cost_per_person', 'staff_hours']

# Remove any rows with missing data
analysis_df = df[['site', 'period'] + metrics].dropna()

# Standardize metrics (important for anomaly detection)
scaler = StandardScaler()
scaled_metrics = scaler.fit_transform(analysis_df[metrics])

# Run Isolation Forest
# contamination: expected % of anomalies (start with 0.1 = 10%)
iso_forest = IsolationForest(
    contamination=0.1,
    random_state=42
)

# -1 = anomaly, 1 = normal
predictions = iso_forest.fit_predict(scaled_metrics)

# Get anomaly scores (more negative = more anomalous)
scores = iso_forest.score_samples(scaled_metrics)

# Add results back to dataframe
analysis_df['is_anomaly'] = predictions
analysis_df['anomaly_score'] = scores

# Sort by most anomalous
anomalies = analysis_df[analysis_df['is_anomaly'] == -1].copy()
anomalies = anomalies.sort_values('anomaly_score')

print(f"\nFound {len(anomalies)} anomalies ({len(anomalies)/len(analysis_df)*100:.1f}% of data)")

# Show top anomalies
print("\nTop 10 most unusual cases:")
print(anomalies.head(10)[['site', 'period'] + metrics + ['anomaly_score']])

# Explain what's unusual about each anomaly
for idx, row in anomalies.head(10).iterrows():
    print(f"\n{row['site']} in {row['period']}:")

    # Compare to average
    for metric in metrics:
        site_val = row[metric]
        avg_val = analysis_df[metric].mean()
        diff_pct = ((site_val - avg_val) / avg_val) * 100

        if abs(diff_pct) > 20:  # Flag if > 20% different from average
            direction = "higher" if diff_pct > 0 else "lower"
            print(f"  - {metric}: {diff_pct:.1f}% {direction} than average")

# Visualize anomalies (example: people served vs outcome rate)
plt.figure(figsize=(10, 6))
plt.scatter(
    analysis_df[analysis_df['is_anomaly'] == 1]['people_served'],
    analysis_df[analysis_df['is_anomaly'] == 1]['outcome_rate'],
    alpha=0.6, label='Normal', color='blue'
)
plt.scatter(
    anomalies['people_served'],
    anomalies['outcome_rate'],
    alpha=0.8, label='Anomaly', color='red', s=100
)
plt.xlabel('People served')
plt.ylabel('Outcome rate (%)')
plt.title('Anomaly detection: service performance')
plt.legend()
plt.show()

# Export anomalies for investigation
anomalies.to_csv('flagged_anomalies.csv', index=False)
print("\nAnomalies exported to flagged_anomalies.csv for investigation")

Tools

Google Colabplatform · freemium
Visit →
scikit-learnlibrary · free · open source
Visit →
pandaslibrary · free · open source
Visit →

Resources

At a glance

Time to implement
days
Setup cost
free
Ongoing cost
free
Cost trend
stable
Organisation size
medium, large
Target audience
operations-manager, data-analyst, ceo-trustees

All tools are free. Once set up, running monthly anomaly checks takes minutes.