Predict service user needs from initial assessment
The problem
During intake, you collect basic information, but you won't know the full support needs until weeks into the relationship. By then, you might have matched someone to the wrong service or worker. You want earlier signals: based on initial presentation, what support does this person likely need? Can you route them better from day one?
The solution
Train a classifier on historical intake data matched with eventual support provided. The model learns patterns: people presenting with X characteristics at intake typically need Y type of support. Use predictions to route people faster and prepare appropriate resources, while always confirming needs through proper assessment.
What you get
A support needs prediction tool that suggests likely service requirements based on intake information. Shows: (1) Predicted primary support needs with confidence score, (2) Recommended initial service route, (3) Flagged risks or complexities to watch for, (4) Similar past cases for staff reference. Humans always make final decisions, but predictions speed up triage.
Before you start
- At least 12 months of intake records with eventual support provided (200+ cases)
- Consistent intake data: demographics, presenting issues, initial assessment notes
- Clear categorisation of support types (counselling, advocacy, practical help, etc.)
- Staff willing to use predictions as suggestions, not rules
- Understanding this is triage support, not clinical diagnosis
When to use this
- High volume intake where faster triage would help (50+ new cases per month)
- Multiple service pathways - routing decision is complex
- Staff struggle with initial triage decisions (inconsistent routing)
- You have 200+ historical cases to learn from with clear outcomes
- Want to reduce time from intake to appropriate support starting
When not to use this
- Low volume service (< 20 cases per month) - manual triage is fine
- All service users follow same pathway regardless of needs
- Intake data is too sparse to be predictive
- Cultural sensitivity requires human judgment you can't capture in data
- Any suggestion this replaces professional assessment (it's triage support only)
- Historical data has systemic bias you'd encode into the model
Steps
- 1
Extract and clean historical intake data
Pull 12-24 months of intake records including: demographics (age band, not exact age), presenting issues, referral source, initial assessment scores/flags, eventual support provided. Anonymise by removing names but keeping case IDs. Clean: standardise categories, handle missing data, remove test cases.
- 2
Define support need categories to predict
Decide what you're predicting: specific support types (counselling, advocacy, employment support), intensity level (light touch vs intensive), or risk flags (complex needs, safeguarding concerns). Start simple with 3-5 categories. Too granular and you won't have enough data per category.
- 3
Engineer features from intake data
Create model features: convert free-text presenting issues to categories, create flags for key patterns (mentions of housing, mentions of mental health), include demographics, referral source. The model learns which intake patterns predict which support needs. Be careful with protected characteristics - only include if genuinely predictive and non-discriminatory.
- 4
Train and evaluate classifier
Split data: 80% training, 20% test. Train RandomForest or LogisticRegression classifier. Evaluate accuracy: aim for 60-70%+ on test set. Check per-category performance - some needs might be easy to predict (e.g., housing support when housing mentioned), others hard (complex mental health needs). Identify which predictions to trust.
- 5
Check for bias in predictions
Critically important: check if model predictions differ by protected characteristics. Are certain age groups, genders, or ethnicities systematically routed differently? If yes, investigate why. Is it genuine need difference or encoded bias from historical decisions? Don't deploy a model that amplifies discrimination.
- 6
Build simple interface for staff
Create tool staff can use: input new case details, get prediction with confidence score. Show: predicted support needs, confidence level, similar past cases (for staff to review), clear message that this is a suggestion not a decision. Keep it simple - if too complex, staff won't use it.
- 7
Pilot with small group of staff
Roll out to 3-5 staff first. Train them: this is triage support, you're still doing full assessment, use your professional judgment. Gather feedback: are predictions helpful? Do they speed up routing? Any concerning patterns? Adjust based on learning before wider rollout.
- 8
Monitor predictions vs actual needs
Track: when predictions were right, when wrong, why. Log staff overrides and reasons. Use this to retrain model monthly - it should get better as it learns from more cases. If accuracy drops or bias emerges, investigate immediately. This monitoring is not optional.
Example code
Train support needs classifier from intake data
Train classifier to predict support needs from intake data. Always requires human confirmation.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix
# Load historical intake and support data
df = pd.read_csv('intake_outcomes.csv')
# Feature engineering
# Convert presenting issues to binary flags
df['mentions_housing'] = df['presenting_issues'].str.contains('housing', case=False, na=False)
df['mentions_mental_health'] = df['presenting_issues'].str.contains('mental health|anxiety|depression', case=False, na=False)
df['mentions_employment'] = df['presenting_issues'].str.contains('job|employment|work', case=False, na=False)
# Encode categorical variables
label_encoders = {}
for col in ['age_band', 'referral_source']:
le = LabelEncoder()
df[col + '_encoded'] = le.fit_transform(df[col].fillna('unknown'))
label_encoders[col] = le
# Select features for model
feature_columns = [
'age_band_encoded',
'referral_source_encoded',
'mentions_housing',
'mentions_mental_health',
'mentions_employment',
# Add more features as appropriate
]
X = df[feature_columns]
y = df['support_type_provided'] # What we're predicting
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Train classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced')
clf.fit(X_train, y_train)
# Evaluate
y_pred = clf.predict(X_test)
print("Classification Performance:")
print(classification_report(y_test, y_pred))
# Feature importance
feature_importance = pd.DataFrame({
'feature': feature_columns,
'importance': clf.feature_importances_
}).sort_values('importance', ascending=False)
print("\nMost Important Features for Prediction:")
print(feature_importance)
# Example prediction for new case
def predict_support_needs(intake_data):
"""Predict support needs from intake information"""
# Process intake data to match training features
features = {
'age_band_encoded': label_encoders['age_band'].transform([intake_data['age_band']])[0],
'referral_source_encoded': label_encoders['referral_source'].transform([intake_data['referral_source']])[0],
'mentions_housing': 'housing' in intake_data['presenting_issues'].lower(),
'mentions_mental_health': any(term in intake_data['presenting_issues'].lower()
for term in ['mental health', 'anxiety', 'depression']),
'mentions_employment': any(term in intake_data['presenting_issues'].lower()
for term in ['job', 'employment', 'work']),
}
features_df = pd.DataFrame([features])
prediction = clf.predict(features_df)[0]
probabilities = clf.predict_proba(features_df)[0]
confidence = probabilities.max()
return {
'predicted_support_type': prediction,
'confidence': confidence,
'all_probabilities': dict(zip(clf.classes_, probabilities))
}
# Example
new_case = {
'age_band': '18-25',
'referral_source': 'self-referral',
'presenting_issues': 'struggling with anxiety and looking for work'
}
result = predict_support_needs(new_case)
print(f"\nPrediction for new case:")
print(f" Support type: {result['predicted_support_type']}")
print(f" Confidence: {result['confidence']:.1%}")
print(f" ⚠️ This is a suggestion - staff should complete full assessment")Tools
Resources
Legal requirements when using AI for decisions affecting people.
Bias in ML modelstutorialUnderstanding and mitigating bias in machine learning.
Ethical AI in social servicesdocumentationNHS AI Lab guidance on ethical considerations (applicable to charities).
At a glance
- Time to implement
- weeks
- Setup cost
- free
- Ongoing cost
- free
- Cost trend
- stable
- Organisation size
- medium, large
- Target audience
- program-delivery, operations-manager, data-analyst
Free tools are sufficient. Time cost: 2-3 weeks initial build (including data preparation), then monthly retraining as you gather more cases.