Predict service user needs from initial assessment

service-deliveryintermediateemerging

The problem

During intake, you collect basic information, but you won't know the full support needs until weeks into the relationship. By then, you might have matched someone to the wrong service or worker. You want earlier signals: based on initial presentation, what support does this person likely need? Can you route them better from day one?

The solution

Train a classifier on historical intake data matched with eventual support provided. The model learns patterns: people presenting with X characteristics at intake typically need Y type of support. Use predictions to route people faster and prepare appropriate resources, while always confirming needs through proper assessment.

What you get

A support needs prediction tool that suggests likely service requirements based on intake information. Shows: (1) Predicted primary support needs with confidence score, (2) Recommended initial service route, (3) Flagged risks or complexities to watch for, (4) Similar past cases for staff reference. Humans always make final decisions, but predictions speed up triage.

Before you start

At least 12 months of intake records with eventual support provided (200+ cases minimum, but model performance improves significantly with 500-1000+ cases)
Consistent intake data: demographics, presenting issues, initial assessment notes
Clear categorisation of support types (counselling, advocacy, practical help, etc.)
Staff willing to use predictions as suggestions, not rules
Understanding this is triage support, not clinical diagnosis

When to use this

High volume intake where faster triage would help (50+ new cases per month)
Multiple service pathways - routing decision is complex
Staff struggle with initial triage decisions (inconsistent routing)
You have 200+ historical cases to learn from with clear outcomes
Want to reduce time from intake to appropriate support starting

When not to use this

Low volume service (< 20 cases per month) - manual triage is fine
All service users follow same pathway regardless of needs
Intake data is too sparse to be predictive
Cultural sensitivity requires human judgment you can't capture in data
Any suggestion this replaces professional assessment (it's triage support only)
Historical data has systemic bias you'd encode into the model

Steps

1
Extract and clean historical intake data
Pull 12-24 months of intake records including: demographics (age band, not exact age), presenting issues, referral source, initial assessment scores/flags, eventual support provided. Anonymise by removing names but keeping case IDs. Clean: standardise categories, handle missing data, remove test cases.
2
Define support need categories to predict
Decide what you're predicting: specific support types (counselling, advocacy, employment support), intensity level (light touch vs intensive), or risk flags (complex needs, safeguarding concerns). Start simple with 3-5 categories. Too granular and you won't have enough data per category.
3
Engineer features from intake data
Create model features: convert free-text presenting issues to categories, create flags for key patterns (mentions of housing, mentions of mental health), include demographics, referral source. The model learns which intake patterns predict which support needs. Be careful with protected characteristics - only include if genuinely predictive and non-discriminatory.
4
Train and evaluate classifier
Split data: 80% training, 20% test. Train RandomForest or LogisticRegression classifier. Evaluate accuracy: aim for 60-70%+ on test set. Check per-category performance - some needs might be easy to predict (e.g., housing support when housing mentioned), others hard (complex mental health needs). Identify which predictions to trust.
5
Check for bias in predictions
Critically important: check if model predictions differ by protected characteristics. Are certain age groups, genders, or ethnicities systematically routed differently? If yes, investigate why. Is it genuine need difference or encoded bias from historical decisions? Don't deploy a model that amplifies discrimination.
6
Build simple interface for staff
Create tool staff can use: input new case details, get prediction with confidence score. Show: predicted support needs, confidence level, similar past cases (for staff to review), clear message that this is a suggestion not a decision. Keep it simple - if too complex, staff won't use it.
7
Pilot with small group of staff
Roll out to 3-5 staff first. Train them: this is triage support, you're still doing full assessment, use your professional judgment. Gather feedback: are predictions helpful? Do they speed up routing? Any concerning patterns? Adjust based on learning before wider rollout.
8
Monitor predictions vs actual needs
Track: when predictions were right, when wrong, why. Log staff overrides and reasons. Use this to retrain model monthly - it should get better as it learns from more cases. If accuracy drops or bias emerges, investigate immediately. This monitoring is not optional.

Example code

Train support needs classifier from intake data

Train classifier to predict support needs from intake data. Always requires human confirmation.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix

# Load historical intake and support data
df = pd.read_csv('intake_outcomes.csv')

# Feature engineering
# Convert presenting issues to binary flags
df['mentions_housing'] = df['presenting_issues'].str.contains('housing', case=False, na=False)
df['mentions_mental_health'] = df['presenting_issues'].str.contains('mental health|anxiety|depression', case=False, na=False)
df['mentions_employment'] = df['presenting_issues'].str.contains('job|employment|work', case=False, na=False)

# Encode categorical variables
# NOTE: LabelEncoder assigns arbitrary numbers (0, 1, 2...) which tree-based models
# handle reasonably well. However, if using logistic regression or seeing unexpected
# results, consider using OneHotEncoder instead for truly categorical variables.
label_encoders = {}
for col in ['age_band', 'referral_source']:
    le = LabelEncoder()
    df[col + '_encoded'] = le.fit_transform(df[col].fillna('unknown'))
    label_encoders[col] = le

# Select features for model
feature_columns = [
    'age_band_encoded',
    'referral_source_encoded',
    'mentions_housing',
    'mentions_mental_health',
    'mentions_employment',
    # Add more features as appropriate
]

X = df[feature_columns]
y = df['support_type_provided']  # What we're predicting

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Train classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced')
clf.fit(X_train, y_train)

# Evaluate
y_pred = clf.predict(X_test)
print("Classification Performance:")
print(classification_report(y_test, y_pred))

# Feature importance
feature_importance = pd.DataFrame({
    'feature': feature_columns,
    'importance': clf.feature_importances_
}).sort_values('importance', ascending=False)

print("\nMost Important Features for Prediction:")
print(feature_importance)

# Example prediction for new case
def predict_support_needs(intake_data):
    """Predict support needs from intake information"""
    # Process intake data to match training features
    # Note: this will fail if age_band or referral_source contains a value
    # not seen in training data. In production, use sklearn Pipeline with
    # handle_unknown='use_encoded_value' or check for unseen categories first.
    try:
        age_encoded = label_encoders['age_band'].transform([intake_data['age_band']])[0]
        source_encoded = label_encoders['referral_source'].transform([intake_data['referral_source']])[0]
    except ValueError as e:
        return {'error': f'Unknown category in input: {e}. Model needs retraining with new category.'}

    features = {
        'age_band_encoded': age_encoded,
        'referral_source_encoded': source_encoded,
        'mentions_housing': 'housing' in intake_data['presenting_issues'].lower(),
        'mentions_mental_health': any(term in intake_data['presenting_issues'].lower()
                                     for term in ['mental health', 'anxiety', 'depression']),
        'mentions_employment': any(term in intake_data['presenting_issues'].lower()
                                  for term in ['job', 'employment', 'work']),
    }

    features_df = pd.DataFrame([features])
    prediction = clf.predict(features_df)[0]
    probabilities = clf.predict_proba(features_df)[0]
    confidence = probabilities.max()

    return {
        'predicted_support_type': prediction,
        'confidence': confidence,
        'all_probabilities': dict(zip(clf.classes_, probabilities))
    }

# Example
new_case = {
    'age_band': '18-25',
    'referral_source': 'self-referral',
    'presenting_issues': 'struggling with anxiety and looking for work'
}

result = predict_support_needs(new_case)
print(f"\nPrediction for new case:")
print(f"  Support type: {result['predicted_support_type']}")
print(f"  Confidence: {result['confidence']:.1%}")
print(f"  ⚠️  This is a suggestion - staff should complete full assessment")