// glossary
Plain language.
No black boxes.
The localization industry runs on jargon. Some of it is useful shorthand; some of it obscures more than it clarifies. These entries explain the concepts as we actually use them, including where they fail.
- BLEU and COMET Automated metrics for evaluating machine translation output: BLEU measures n-gram overlap against a reference translation; COMET uses a neural model trained on human quality judgments to score outputs more closely aligned with professional evaluation.
- Embeddings Numerical representations of text as vectors in a high-dimensional space, where semantically similar content is positioned close together, enabling meaning-based search and comparison.
- Fine-tuning Continuing the training of a pre-trained language model on a smaller, curated dataset to adapt its behavior to a specific domain, style, or task, permanently changing the model's weights.
- Locale A locale is a combination of language and region that defines not just which words to use, but how to format numbers, dates, currencies, and addresses, and the cultural conventions that surround them.
- MTPE Machine Translation Post-Editing: the process of having a human translator review and correct machine-translated text to a defined quality standard, rather than translating from scratch.
- Prompt engineering The practice of designing, testing, and refining the instructions given to a language model to produce consistently useful output, including specifying task, context, constraints, format, and edge case handling.
- RAG Retrieval-Augmented Generation: a technique that gives a language model access to an external knowledge base at inference time, rather than relying on what was baked into the weights during training.
- Tag handling The management of inline formatting markers, placeholders, and structural codes (HTML tags, variables, custom syntax) that must survive the translation process intact, correctly positioned, and unmodified.
- Terminology / Termbase A structured database of approved term pairs (source to target) that specifies exactly how domain-specific words and phrases must be translated within a product or organization, along with context, usage notes, and forbidden alternatives.
- Translation memory A database of previously translated segment pairs (source text and its approved human translation) used to suggest matches when the same or similar content recurs.
- Vector database A database built to store and search high-dimensional embeddings efficiently, using approximate nearest-neighbor algorithms to find semantically similar content at scale.
- XLIFF XML Localisation Interchange File Format: an XML-based standard for packaging localizable content for exchange between authoring tools, translation management systems, CAT tools, and QA systems.