// glossary

Plain language.
No black boxes.

The localization industry runs on jargon. Some of it is useful shorthand; some of it obscures more than it clarifies. These entries explain the concepts as we actually use them, including where they fail.

BLEU and COMET Automated metrics for evaluating machine translation output: BLEU measures n-gram overlap against a reference translation; COMET uses a neural model trained on human quality judgments to score outputs more closely aligned with professional evaluation.
Embeddings Numerical representations of text as vectors in a high-dimensional space, where semantically similar content is positioned close together, enabling meaning-based search and comparison.
Fine-tuning Continuing the training of a pre-trained language model on a smaller, curated dataset to adapt its behavior to a specific domain, style, or task, permanently changing the model's weights.
Locale A locale is a combination of language and region that defines not just which words to use, but how to format numbers, dates, currencies, and addresses, and the cultural conventions that surround them.
MTPE Machine Translation Post-Editing: the process of having a human translator review and correct machine-translated text to a defined quality standard, rather than translating from scratch.
Prompt engineering The practice of designing, testing, and refining the instructions given to a language model to produce consistently useful output, including specifying task, context, constraints, format, and edge case handling.
RAG Retrieval-Augmented Generation: a technique that gives a language model access to an external knowledge base at inference time, rather than relying on what was baked into the weights during training.
Tag handling The management of inline formatting markers, placeholders, and structural codes (HTML tags, variables, custom syntax) that must survive the translation process intact, correctly positioned, and unmodified.
Terminology / Termbase A structured database of approved term pairs (source to target) that specifies exactly how domain-specific words and phrases must be translated within a product or organization, along with context, usage notes, and forbidden alternatives.
Translation memory A database of previously translated segment pairs (source text and its approved human translation) used to suggest matches when the same or similar content recurs.
Vector database A database built to store and search high-dimensional embeddings efficiently, using approximate nearest-neighbor algorithms to find semantically similar content at scale.
XLIFF XML Localisation Interchange File Format: an XML-based standard for packaging localizable content for exchange between authoring tools, translation management systems, CAT tools, and QA systems.