← Back to recipes

Assess your data readiness for AI

data-analysisbeginnerproven

The problem

You want to use AI but you're not sure if your data is good enough. Every AI project depends on data, but 'good enough' varies wildly depending on what you're trying to do. Some projects need thousands of clean records, others work fine with messy spreadsheets. You need to know what state your data is in and whether it's fit for your specific purpose.

The solution

Run a structured data readiness assessment that evaluates your data across six dimensions: availability, volume, quality, consistency, accessibility, and documentation. Score each dimension against the requirements of your specific use case, not against some abstract ideal. This tells you exactly what needs fixing before you start.

What you get

A data readiness scorecard specific to your AI use case. You'll know which dimensions are blockers, which are acceptable, and what specific actions would improve readiness. This prevents the common pattern of starting an AI project and discovering three months in that the data isn't usable.

Before you start

  • A specific AI use case in mind
  • Access to the data you plan to use (or knowledge of where it is)
  • Basic understanding of what the AI technique needs
  • For beneficiary/donor data: confirmation that original consent covers analytical use, or a plan for a Data Protection Impact Assessment (DPIA) if needed

When to use this

  • Before starting any AI project
  • When scoping an AI funding bid
  • When a vendor says they can do AI with your data
  • After a failed AI project to understand what went wrong

When not to use this

  • You're using AI tools that don't need your data (e.g., general drafting) - though even here, be cautious about pasting organisational data into public/free AI tools
  • You have no data at all (different problem)
  • The project doesn't involve your organisational data

Steps

  1. 1

    Define what data you need

    Based on your AI use case, specify what data is required. For donor prediction: donation history, contact records, engagement data. For document classification: documents with known categories to learn from. Be specific about fields, not just "our CRM data".

  2. 2

    Assess availability

    Does this data actually exist? Score: 5 = All required data exists and is captured. 3 = Most exists but some fields never collected. 1 = Critical data doesn't exist. Common gaps: outcomes data, historical records, linked data across systems.

  3. 3

    Assess volume

    Do you have enough? Requirements vary hugely: LLMs can work with as few as 50 examples using in-context learning (examples in the prompt - not fine-tuning which needs thousands), ML classification needs hundreds, prediction models need thousands. Score: 5 = Plenty for your technique. 3 = Borderline, might work. 1 = Far too little.

  4. 4

    Assess quality

    How accurate and complete is it? Score: 5 = Regularly validated, few errors, minimal missing values. 3 = Some known issues but mostly usable. 1 = Significant errors, lots of missing data, untrustworthy. Use your data quality check recipe if unsure.

  5. 5

    Assess consistency

    Is it standardised? Score: 5 = Consistent formats, controlled vocabularies, validated entry. 3 = Some inconsistency but patterns are clear. 1 = Freeform entry, multiple formats, impossible to parse reliably.

  6. 6

    Assess accessibility

    Can you actually get it? Score: 5 = Easy export, API access, you control the system. 3 = Can export but requires IT help or manual work. 1 = Locked in vendor system, no export, would need to recreate.

  7. 7

    Assess documentation

    Do you understand it? Score: 5 = Data dictionary exists, field meanings clear, collection process documented. 3 = Tribal knowledge exists, could document it. 1 = No one knows what half the fields mean.

  8. 8

    Calculate readiness and identify blockers

    Sum your scores (max 30). Above 24: Ready to proceed. 18-24: Proceed with caution, address gaps in parallel. 12-18: Fix critical issues first. Below 12: Not ready, significant data work needed before AI. Any single score of 1 is a blocker regardless of total.

Example code

Example data readiness assessment

Assessment for a donor lapse prediction project.

# Data Readiness Assessment
## Use case: Predict which donors will stop giving

### Required data
- Donation history (amounts, dates, frequency)
- Contact record (email, tenure, source)
- Engagement (events attended, emails opened)
- Outcome (whether they lapsed - for training)

### Scores

| Dimension | Score | Notes |
|-----------|-------|-------|
| Availability | 4 | Have donations & contacts. Email engagement only from 2022. |
| Volume | 5 | 8,000 donors with 3+ years history. Plenty. |
| Quality | 3 | 15% missing email, some duplicate records. |
| Consistency | 4 | CRM enforces most formats. Some legacy free text. |
| Accessibility | 5 | Full export capability, we own the system. |
| Documentation | 2 | No data dictionary. Several mystery fields. |

**Total: 23/30** - Proceed with caution

### Blockers and actions
1. Documentation (2): Create data dictionary before project. 2 days work.
2. Quality (3): Run deduplication. Clean up emails where possible.
3. Availability (4): Accept limited engagement data, or wait 6 months.

### Recommendation
**Proceed** with data cleaning sprint first (1 week).
Document data during cleaning. Engagement limitation is acceptable.

Tools

Spreadsheetplatform · free
Claude or ChatGPTservice · freemium
Visit →

Resources

At a glance

Time to implement
hours
Setup cost
free
Ongoing cost
free
Cost trend
stable
Organisation size
micro, small, medium, large
Target audience
data-analyst, operations-manager, it-technical

Assessment is free. Fixing gaps may require investment.