RAG
Retrieval-Augmented Generation: a technique that gives a language model access to an external knowledge base at inference time, rather than relying on what was baked into the weights during training.
The honest version
RAG does not make a model smarter. It makes it less ignorant about your specific context.
The core idea is simple: before the model generates a response, you retrieve the most relevant pieces of information from an external store (a translation memory, a glossary, a style guide, a set of approved segments) and include them in the prompt. The model then generates with that context in view, rather than guessing from general training.
In translation, this is the difference between a model that knows “submit” in English and a model that knows “submit” in your procurement software means something with legal weight, and that your approved German equivalent is “einreichen”, not “senden”, not “übermitteln”.
The retrieval step is typically semantic: you convert the source segment into a vector embedding, search a vector database for the nearest matches, and surface the top candidates. Speed matters; the retrieval adds latency, so the pipeline needs to be tuned.
Why it matters for translation
Generic MT and off-the-shelf LLMs are trained on public data. That data does not include your product’s terminology, your style guide decisions, or the ten years of reviewed translations sitting in your TM. RAG is how you bring that institutional knowledge into the model’s context window, without retraining.
It also degrades gracefully. When retrieval finds nothing relevant, the model falls back to its base training. When it finds a strong match, you can surface it explicitly: “the approved translation for this segment is X. Use it.” This is more controllable than fine-tuning alone, and faster to update when terminology changes.
For localization specifically, RAG pipelines typically retrieve from three sources:
- Translation memory: previous segments reviewed and approved by human translators
- Termbase / glossary: authoritative mappings for domain-specific terms
- Style guide: rules about register, punctuation, formatting, and forbidden constructions
Where it fails
RAG is not a silver bullet. Retrieval quality depends on how well your content is embedded and indexed. If your TM is inconsistent (different translators, different periods, different style guides), RAG will surface that inconsistency. Garbage in, garbage out.
The context window has limits. If you are translating a long document with dense terminology, you cannot retrieve and include everything. You have to prioritize, and prioritization requires judgment.
RAG also does not help with structural problems: if the source text is ambiguous, or if the model does not understand the domain deeply enough, retrieved context cannot compensate. It narrows the failure modes; it does not eliminate them.
Finally, latency. Every retrieval step adds time. For high-volume, real-time pipelines, this is an engineering constraint worth measuring before you design the architecture.