← Glossary

XLIFF

XML Localisation Interchange File Format: an XML-based standard for packaging localizable content for exchange between authoring tools, translation management systems, CAT tools, and QA systems.

The honest version

XLIFF is the lingua franca of the professional localization toolchain. When a content management system exports text for translation, and the TM needs to import it, and the CAT tool needs to display it, and the QA tool needs to validate it, XLIFF is typically the format that makes this pipeline possible without custom integration for each pair.

The format wraps source segments, their translations, and substantial metadata in XML. A basic XLIFF file contains source strings and empty target fields; a processed XLIFF file contains source, target, match metadata, translator annotations, and state indicators. The state attribute tracks where a segment is in the workflow: new, translated, reviewed, final. These states are how a TMS knows which segments still need work.

There are two versions that matter: XLIFF 1.2 (2008) and XLIFF 2.0 (2014). They are not backward compatible.

Why it matters for translation

XLIFF carries more than text. It carries workflow state, match metadata, inline tag structure, and authority: the declaration that this content has been through a review process and the result is recorded here.

For any organization running professional localization at volume, XLIFF (or a format that maps to it) is not optional. It is the mechanism by which content moves through a multi-step workflow (extraction, pre-translation, TM matching, human translation, review, QA, delivery) without losing its history or structure. A system that cannot import and export XLIFF cannot participate in the standard localization toolchain without custom connectors.

XLIFF also preserves inline formatting. Source strings in software often contain HTML tags, placeholders, and structural markers. XLIFF represents these as inline codes (<ph>, <it>, <bpt>, <ept>) that CAT tools display as locked, untranslatable elements: visible to the translator for placement context, but not editable as text. This is the mechanism that prevents translators from accidentally modifying formatting markup.

For LLM-based pipelines, XLIFF compatibility determines whether AI-generated translations can flow back into the existing toolchain without manual conversion steps. A pipeline that outputs XLIFF can be dropped into any standard TMS; a pipeline that outputs plain JSON or plain text requires integration work at every handoff.

Where it fails

The XLIFF compatibility problem is real and persistent.

XLIFF 1.2 and XLIFF 2.0 are not interchangeable. Many tools claim “XLIFF support” and implement only one version, or implement partial or proprietary variants of one version. The result is an ecosystem where passing a file from tool A to tool B and back produces a file that nominally conforms to XLIFF but has lost metadata, changed state values, or mangled inline codes. Testing XLIFF round-trip fidelity is essential before committing to a toolchain.

The standard is rich enough that compliant implementations can be incompatible in practice. Two tools can both produce valid XLIFF 1.2 files and still fail to exchange data correctly because one uses a subset the other does not expect. “We support XLIFF” is not a guarantee of interoperability. It is a starting point for integration testing.

XLIFF assumes a bilingual model: one source language, one target language, per file. For a 30-locale release, you have 30 files per source document. This is manageable but creates version control and delivery complexity, particularly when source content is updated mid-cycle and the change must propagate to 30 files, some of which have already been partially translated.

Finally: XLIFF is verbose. Large software products with tens of thousands of strings produce very large XLIFF files. Some tools have memory and performance limitations when handling files above a certain size. File splitting (breaking source content into smaller packages for translation) is a standard workaround, but it introduces its own complexity in reassembly and TM leveraging across split files.