// services

RAG Translation Pipeline

Your translation memory, style guide, and termbase retrieved at inference time, not bolted on after. We build the complete retrieval stack: embedding, indexing, retrieval, prompt assembly, and output validation, integrated with the workflow you already have.

The problem

A language model that does not have access to your TM is not using your most valuable localization asset. It generates from scratch on every segment, ignoring years of reviewed translations and thousands of approved terminology decisions. The output is linguistically plausible but institutionally ignorant: it does not know what your organization has decided, only what the training corpus suggested.

The naive solution is prompt engineering: paste the relevant TM segments into the prompt manually. This does not scale. A production localization pipeline handles thousands of segments per release. Manual context injection is not a workflow; it is a demonstration.

The actual solution is retrieval architecture. At inference time, the pipeline automatically finds the most relevant approved segments from your TM, the approved terminology from your termbase, and the applicable rules from your style guide, and assembles them into the prompt before the model generates. The model sees your institutional knowledge. It generates with it in context.

How we approach it

We begin with an audit of your linguistic assets: TM quality and coverage, termbase completeness, style guide specificity. A retrieval pipeline is only as good as what it retrieves. If the TM is inconsistent, we recommend a cleanup pass before vectorization. Garbage in is not something retrieval architecture can fix.

Embedding strategy matters. We select and evaluate embedding models for your language pairs and domain. A model trained on general web text encodes domain-specific vocabulary differently than one trained on technical or legal text. We measure retrieval precision on a sample of your content before committing to an indexing strategy.

The retrieval logic is explicit and auditable. When the pipeline retrieves a TM segment and includes it in the prompt, that decision is logged: which segment was retrieved, what the match score was, what the output was. This makes quality review tractable: a reviewer can see exactly what context the model was working with for each segment.

Output validation is a first-class component, not an afterthought. Every generated segment passes through structured checks: tag preservation, approved terminology presence, character count constraints, forbidden-term absence. Segments that fail validation are routed to human review rather than passed through.

// how it works

01 / 07

// source + document analysis

A segment arrives

A segment never arrives alone. It comes with a document. The first time we encounter that document, the pipeline embeds it whole, not just the unit in front of us. Style, structure, how the segments relate. That representation stays with us, and the contextual agent will read from it for every unit that follows.

first encounter: full document analyzed once

02 / 07

// tokenization

What the model actually sees

The model does not see your sentence. It sees the sequence of subword tokens its tokenizer produces. The unit of meaning shifts from word to token, and a translation system that ignores this is missing the lever it needs to control.

03 / 07

// translation memory

Retrieval by meaning

We retrieve from your translation memory not by string match but by meaning. Approved precedents (segments your reviewers signed off on, sometimes years ago) become candidate context for what the model is about to generate. We log which segments were retrieved, with what scores, so quality review is tractable.

04 / 07

// glossary

Approved terminology is a constraint

Approved terminology is not a suggestion to the model. It is a constraint. The glossary layer detects terms that have a client-approved equivalent and locks them in before generation. 'Trash' becomes 'Papierkorb' (Apple's term) and not 'Müll' or 'Mülleimer', regardless of what a generic model might prefer.

05 / 07

// contextual agent

The model translates a segment that knows it is a segment

This is where most pipelines stop. Ours does not. The contextual agent reads from the document-level analysis and tells the model what it needs to know about this specific unit: where it sits, what comes before and after, what register the document uses overall, what it is trying to do. The model translates a segment that knows it is a segment.

06 / 07

// prompt assembly + generation

Everything the organization has decided, in context

The model receives the source, the retrieved TM precedents, the locked terminology, the document context the agent assembled, and the client's style guide. What it has in front of it is not a sentence in isolation. It is a unit of work with everything your organization has decided about it. Then it generates.

style guide currently prompt-injected, moving to retrieval next

07 / 07

// validation → target

What leaves the pipeline has been checked

Every generated segment passes structured checks before it leaves the pipeline. Tags preserved, terminology matched against the locked terms, character count within constraints, forbidden terms absent. What passes goes to delivery. What fails goes to a human reviewer, with the failure reason attached.

en → de

Move to Trash

// first encounter: analyzing whole document

tokens

MovetoTrash

Move to Trash

retrieved precedents

"Send to Trash", approved EN→DE, 2019
"Empty Trash", approved EN→DE, 2021

Move to Trash

Papierkorb

// locked: client-approved

Move to Trash

position: action sheet
register: informal (du)
domain: macOS UI
audience: end user

source

tm precedents

terminology lock

contextual agent

style guide prompt-injected, retrieval next

In den Papierkorb legen

tags preserved
terminology matched
character count within bounds
no forbidden terms

In den Papierkorb legen

// what we have shipped

6 weeks

Full RAG pipeline, procurement SaaS, 15 locales

TM vectorization, semantic retrieval, termbase injection, LLM translation, output validation. Deployed to production handling 4,000 segments per release cycle with p95 latency under 2 seconds.

Enterprise SaaS client

2 weeks

Semantic TM search layer

Replaced string fuzzy matching with embedding-based retrieval. TM hit rate improved from 43% to 67% on net-new content. Integration into existing Phrase TMS via API.

Software development company

3 weeks

Multi-source retrieval architecture

Unified retrieval from TM, termbase, and style guide snippets with priority scoring. Termbase entries override TM matches when both are relevant. Deterministic for approved terms.

B2B SaaS company

Full shipping history →

// work with us

Tell us what you are building.

We respond within one business day. If the project is a good fit, we will schedule a short call to understand the scope before proposing anything.

Get in touch