Assess your data readiness for AI

data-analysisbeginnerproven

The problem

You want to use AI but you're not sure if your data is good enough. Every AI project depends on data, but 'good enough' varies wildly depending on what you're trying to do. Some projects need thousands of clean records, others work fine with messy spreadsheets. You need to know what state your data is in and whether it's fit for your specific purpose.

The solution

Run a structured data readiness assessment that evaluates your data across six dimensions: availability, volume, quality, consistency, accessibility, and documentation. Score each dimension against the requirements of your specific use case, not against some abstract ideal. This tells you exactly what needs fixing before you start.

What you get

A data readiness scorecard specific to your AI use case. You'll know which dimensions are blockers, which are acceptable, and what specific actions would improve readiness. This prevents the common pattern of starting an AI project and discovering three months in that the data isn't usable.

Before you start

A specific AI use case in mind
Access to the data you plan to use (or knowledge of where it is)
Basic understanding of what the AI technique needs
For beneficiary/donor data: confirmation that original consent covers analytical use, or a plan for a Data Protection Impact Assessment (DPIA) if needed

When to use this

Before starting any AI project
When scoping an AI funding bid
When a vendor says they can do AI with your data
After a failed AI project to understand what went wrong

When not to use this

You're using AI tools that don't need your data (e.g., general drafting) - though even here, be cautious about pasting organisational data into public/free AI tools
You have no data at all (different problem)
The project doesn't involve your organisational data

Steps

1
Define what data you need
Based on your AI use case, specify what data is required. For donor prediction: donation history, contact records, engagement data. For document classification: documents with known categories to learn from. Be specific about fields, not just "our CRM data".
2
Assess availability
Does this data actually exist? Score: 5 = All required data exists and is captured. 3 = Most exists but some fields never collected. 1 = Critical data doesn't exist. Common gaps: outcomes data, historical records, linked data across systems.
3
Assess volume
Do you have enough? Requirements vary hugely: LLMs can work with as few as 50 examples using in-context learning (examples in the prompt - not fine-tuning which needs thousands), ML classification needs hundreds, prediction models need thousands. Score: 5 = Plenty for your technique. 3 = Borderline, might work. 1 = Far too little.
4
Assess quality
How accurate and complete is it? Score: 5 = Regularly validated, few errors, minimal missing values. 3 = Some known issues but mostly usable. 1 = Significant errors, lots of missing data, untrustworthy. Use your data quality check recipe if unsure.
5
Assess consistency
Is it standardised? Score: 5 = Consistent formats, controlled vocabularies, validated entry. 3 = Some inconsistency but patterns are clear. 1 = Freeform entry, multiple formats, impossible to parse reliably.
6
Assess accessibility
Can you actually get it? Score: 5 = Easy export, API access, you control the system. 3 = Can export but requires IT help or manual work. 1 = Locked in vendor system, no export, would need to recreate.
7
Assess documentation
Do you understand it? Score: 5 = Data dictionary exists, field meanings clear, collection process documented. 3 = Tribal knowledge exists, could document it. 1 = No one knows what half the fields mean.
8
Calculate readiness and identify blockers
Sum your scores (max 30). Above 24: Ready to proceed. 18-24: Proceed with caution, address gaps in parallel. 12-18: Fix critical issues first. Below 12: Not ready, significant data work needed before AI. Any single score of 1 is a blocker regardless of total.

Example code

Example data readiness assessment

Assessment for a donor lapse prediction project.

# Data Readiness Assessment
## Use case: Predict which donors will stop giving

### Required data
- Donation history (amounts, dates, frequency)
- Contact record (email, tenure, source)
- Engagement (events attended, emails opened)
- Outcome (whether they lapsed - for training)

### Scores

| Dimension | Score | Notes |
|-----------|-------|-------|
| Availability | 4 | Have donations & contacts. Email engagement only from 2022. |
| Volume | 5 | 8,000 donors with 3+ years history. Plenty. |
| Quality | 3 | 15% missing email, some duplicate records. |
| Consistency | 4 | CRM enforces most formats. Some legacy free text. |
| Accessibility | 5 | Full export capability, we own the system. |
| Documentation | 2 | No data dictionary. Several mystery fields. |

**Total: 23/30** - Proceed with caution

### Blockers and actions
1. Documentation (2): Create data dictionary before project. 2 days work.
2. Quality (3): Run deduplication. Clean up emails where possible.
3. Availability (4): Accept limited engagement data, or wait 6 months.

### Recommendation
**Proceed** with data cleaning sprint first (1 week).
Document data during cleaning. Engagement limitation is acceptable.

Tools

Spreadsheetplatform · free

Claude or ChatGPTservice · freemium

Visit →

Resources

Data quality dimensionsdocumentation

DAMA framework for data quality assessment.

At a glance

Time to implement: hours
Setup cost: free
Ongoing cost: free
Cost trend: stable
Organisation size: micro, small, medium, large
Target audience: data-analyst, operations-manager, it-technical

Assessment is free. Fixing gaps may require investment.

The problem

The solution

What you get

Before you start

When to use this

When not to use this

Steps

Define what data you need

Assess availability

Assess volume

Assess quality

Assess consistency

Assess accessibility

Assess documentation

Calculate readiness and identify blockers

Example code

Example data readiness assessment

Tools

Resources

At a glance