Build a quality-controlled translation workflow

communicationsintermediateemerging

The problem

You're translating newsletters, web updates, and service communications into multiple languages every month. One-off translations are inconsistent: sometimes formal, sometimes casual, terminology varies. You need quality control but can't afford professional translation for everything. Your multilingual comms feel disjointed.

The solution

Build a three-stage AI translation workflow: translate with controlled tone of voice and terminology, review for accuracy and cultural appropriateness, then check comprehension (does it retain the original meaning and language level?). Use a structured prompt system to maintain consistency. Save your terminology glossary so 'support worker' is always translated the same way.

What you get

A repeatable translation workflow with quality gates. Documents translated consistently in your organisation's voice, with terminology that stays stable across all communications. A record of what was checked and approved. Confidence that translations maintain the meaning and accessibility of the original.

Before you start

Regular need for translations (monthly newsletters, ongoing web content, etc.)
Native speakers who can review translations in each language
A Claude or OpenAI API key
Basic Python skills or willingness to adapt example code
A terminology glossary or willingness to build one
IMPORTANT: Review your data protection policy before sending content to US-based AI providers. Strip any personally identifiable information (names, addresses, case details) from source text before translation. Only translate generic communications, not individual beneficiary correspondence.

When to use this

You're translating regularly and consistency matters
You need to control tone of voice across languages
You want to ensure translations maintain meaning and accessibility
You have native speakers available for review but can't afford full professional translation

When not to use this

You only translate occasionally - simple translation recipe is fine
You don't have native speakers to review - quality control won't work
You're translating legally binding documents - get professional translation
Your content changes so much that terminology consistency doesn't matter
You're translating safeguarding instructions, emergency procedures, or crisis communications - these require professional human translation to ensure life-critical information is accurate

Steps

1
Build your terminology glossary
List key terms that must be translated consistently: your service names, common concepts, technical terms. For each term, specify the approved translation in each language. Include notes about formality (tu/vous in French, formal/informal in Spanish). This is your translation style guide.
2
Define your tone of voice rules
Document how your organisation speaks: Are you formal or approachable? Do you use technical terms or plain language? What reading level do you target? These rules go into your translation prompt so every language matches your voice, not just the words.
3
Create the translation prompt
Write a structured prompt that includes: target language, your terminology glossary, tone of voice rules, audience description, formality level. Tell it to maintain the original's language complexity and flag any terms it's unsure about. Test this on 3-4 documents and refine based on output.
4
Create the review prompt
Second pass: Ask a fresh AI instance to review the translation. Does it match the terminology glossary? Is tone consistent? Are there cultural references that don't translate? Are there better word choices? This catches issues the first pass missed.
5
Create the comprehension check prompt
Third pass: Ask AI to back-translate key points to English, then compare to the original. Check: Does it maintain the same meaning? Same language level? Same key messages? This verifies nothing was lost or distorted in translation. Note: This third pass adds to API costs (roughly tripling them vs single-pass translation). For added independence, consider using a different model for this stage - e.g. if Stage 1 uses GPT-4o, use Claude for Stage 3 to avoid the AI simply confirming its own logic.
6
Build the automated pipeline
Use the example code to chain the three stages: translate → review → comprehension check. The script takes your source text, runs it through all three stages, and outputs the final translation plus a quality report flagging anything uncertain.
7
Test with native speaker review
Run 10-20 documents through your pipeline and have native speakers check the final output. Do they agree with the terminology choices? Does the tone feel right? What needs adjusting? Use this feedback to refine your prompts and glossary.
8
Establish review checkpoints
Decide which translations always need human review (major announcements, policy changes) vs which are low-risk (event reminders, standard updates). High-priority content goes through your workflow plus human approval. Routine content can go straight out after the three-stage check.
9
Update your glossary regularly(optional)
When new terms appear or reviewers suggest better translations, update your glossary and re-run affected documents. Your workflow improves over time as you refine terminology and tone rules.

Example code

Three-stage translation workflow with quality control

This implements translate → review → comprehension check pipeline. Adapt the glossary and tone rules to your organisation. Note: Uses JSON mode with gpt-4o-mini, which requires the prompt to explicitly include the word 'JSON' (each prompt says 'Return JSON with:').

from openai import OpenAI
import json

client = OpenAI()

# Your terminology glossary - maintain this
GLOSSARY = {
    "en-fr": {
        "support worker": "travailleur de soutien",
        "service user": "usager du service",
        "safeguarding": "protection"
    },
    "en-es": {
        "support worker": "trabajador de apoyo",
        "service user": "usuario del servicio",
        "safeguarding": "salvaguardia"
    }
}

# Your tone of voice rules
TONE_RULES = {
    "formality": "approachable but professional",
    "register": "use tu form in French/Spanish (we speak directly to our community)",
    "complexity": "plain language, aim for B1 level",
    "avoid": "jargon, complex sentences, passive voice"
}

def stage_1_translate(text, target_lang):
    """Stage 1: Initial translation with terminology control"""

    glossary_for_lang = GLOSSARY.get(f"en-{target_lang}", {})

    prompt = f"""Translate this to {target_lang.upper()}.

Mandatory terminology (always use these translations):
{json.dumps(glossary_for_lang, indent=2)}

Tone of voice rules:
- {TONE_RULES['formality']}
- {TONE_RULES['register']}
- Keep language simple and accessible ({TONE_RULES['complexity']})

Source text:
{text}

Return JSON with:
- translation: the translated text
- terminology_used: which glossary terms you used
- uncertain_terms: any terms you weren't sure about
- confidence: overall confidence (0-100)"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )

    return json.loads(response.choices[0].message.content)

def stage_2_review(original, translation, target_lang):
    """Stage 2: Quality review by fresh AI instance"""

    glossary_for_lang = GLOSSARY.get(f"en-{target_lang}", {})

    prompt = f"""Review this {target_lang.upper()} translation for quality.

Original English:
{original}

Translation:
{translation}

Check against these criteria:
1. Mandatory terminology used correctly: {json.dumps(glossary_for_lang, indent=2)}
2. Tone is {TONE_RULES['formality']}
3. Uses {TONE_RULES['register']}
4. Language complexity matches original (both are {TONE_RULES['complexity']})
5. No cultural references that don't translate

Return JSON with:
- issues_found: list of any problems
- suggested_changes: specific improvements
- terminology_compliance: did it use the glossary correctly?
- tone_assessment: does it match our tone rules?
- overall_quality: score (0-100)"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )

    return json.loads(response.choices[0].message.content)

def stage_3_comprehension_check(original, translation, target_lang):
    """Stage 3: Check meaning is preserved"""

    prompt = f"""Check if this translation preserves the original meaning.

Original English:
{original}

Translation ({target_lang.upper()}):
{translation}

Tasks:
1. Back-translate the {target_lang.upper()} to English
2. Compare to the original:
   - Are all key messages preserved?
   - Is the language level similar?
   - Is anything added or lost?

Return JSON with:
- back_translation: your English version of the translation
- meaning_preserved: yes/no
- key_differences: any important changes in meaning
- complexity_comparison: is the translation simpler/harder than original?
- recommendation: approve/revise/flag-for-review"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )

    return json.loads(response.choices[0].message.content)

# Full workflow
def translate_with_quality_control(text, target_lang):
    """Run complete three-stage workflow"""

    print(f"\nTranslating to {target_lang}...")

    # Stage 1: Translate
    print("Stage 1: Translating...")
    stage1 = stage_1_translate(text, target_lang)
    translation = stage1['translation']

    # Stage 2: Review
    print("Stage 2: Reviewing...")
    stage2 = stage_2_review(text, translation, target_lang)

    # Apply suggested changes if any
    if stage2.get('suggested_changes'):
        print(f"  - Found {len(stage2['suggested_changes'])} suggestions")
        # In practice, you might want to auto-apply these or flag for human review

    # Stage 3: Comprehension check
    print("Stage 3: Comprehension check...")
    stage3 = stage_3_comprehension_check(text, translation, target_lang)

    # Compile quality report
    report = {
        'target_language': target_lang,
        'translation': translation,
        'stage1_confidence': stage1.get('confidence'),
        'stage2_quality': stage2.get('overall_quality'),
        'stage3_recommendation': stage3.get('recommendation'),
        'issues_flagged': {
            'uncertain_terms': stage1.get('uncertain_terms', []),
            'review_issues': stage2.get('issues_found', []),
            'meaning_differences': stage3.get('key_differences', [])
        },
        'needs_human_review': stage3.get('recommendation') != 'approve'
    }

    return report

# Example usage
source_text = """
We're here to support you. Our support workers can help with housing,
benefits advice, and safeguarding concerns. All service users are welcome.
"""

for lang in ['fr', 'es']:
    result = translate_with_quality_control(source_text, lang)

    print(f"\n{'='*50}")
    print(f"Language: {result['target_language']}")
    print(f"Translation: {result['translation']}")
    print(f"Quality scores: Stage1={result['stage1_confidence']}%, Stage2={result['stage2_quality']}%")
    print(f"Recommendation: {result['stage3_recommendation']}")
    print(f"Needs human review: {result['needs_human_review']}")

    if result['issues_flagged']['uncertain_terms']:
        print(f"Uncertain terms: {result['issues_flagged']['uncertain_terms']}")

Tools

Claude APIservice · paid

Visit →

OpenAI APIservice · paid

Visit →

Google Colabplatform · freemium

Visit →

Resources

OpenAI translation best practicesdocumentation

Building reliable translation systems with LLMs.

Translation quality assurancetutorial

Techniques for validating AI translations.

At a glance

Time to implement: weeks
Setup cost: low
Ongoing cost: low
Cost trend: decreasing
Organisation size: medium, large
Target audience: comms-marketing, operations-manager, it-technical

API costs are ~£0.01-0.05 per document depending on length (3 passes × cost per pass). For 50 documents/month across 4 languages that's roughly £10-15/month. Main investment is setup time and building your glossary.

The problem

The solution

What you get

Before you start

When to use this

When not to use this

Steps

Build your terminology glossary

Define your tone of voice rules

Create the translation prompt

Create the review prompt

Create the comprehension check prompt

Build the automated pipeline

Test with native speaker review

Establish review checkpoints

Update your glossary regularly(optional)

Example code

Three-stage translation workflow with quality control

Tools

Resources

At a glance