// services

Custom fine-tuned LLMs

A translation model that knows your product before it sees a single new string. We fine-tune on your domain, validate with your terminology, and deliver a model that improves with every reviewed translation, not a generic API call dressed up in a prompt.

The problem

Off-the-shelf LLMs are trained on public data. That data does not include your product’s terminology, your style guide decisions, or the ten years of reviewed translations sitting in your TM. The result is output that is linguistically fluent but terminologically inconsistent: a model that treats “workspace” and “workbench” as interchangeable because in general text they are, but in your product they are not.

Prompt injection patches this at the surface. You include the approved term in the prompt; the model uses it for that segment. But the model’s underlying prior still pushes toward the wrong answer, and on segments where you did not inject the specific term, the problem re-emerges. Every release needs the same injection. The context window fills up. The model’s behavior at the edges of the window degrades.

Fine-tuning addresses the prior directly. A model that has learned from your data no longer treats your terminology as general vocabulary. It treats it as the specific vocabulary of your domain, because that is what the training signal told it.

How we approach it

We start with data assessment. The quality of the training data determines the quality of the fine-tuned model, and most organizations’ TMs contain inconsistencies: multiple translators, multiple style guide eras, unreviewed MT segments that were never corrected. We audit before we train. Contaminated training data produces a confidently wrong model.

Clean data is curated, not just filtered. We remove duplicates, resolve terminology conflicts, and ensure that the training signal is consistent: the model should see one approved translation for each approved source pattern, not three alternatives from three different years.

Fine-tuning is iterative. We train, evaluate against held-out domain content, assess with human reviewers, and retrain with corrections. The evaluation criteria match the ones your human translators use: not BLEU, but actual terminology consistency, register correctness, and style guide compliance on a sample your team can review.

We also deliver the evaluation infrastructure. You should be able to run a quality regression check when the model updates or when your terminology changes. We set up that check before we hand over the model.

// what we have shipped

3 weeks

Domain-adapted MT engine for procurement

Fine-tuned on 120k reviewed segments in the client's domain. Terminology consistency improved from 71% to 97% on held-out test set against approved termbase.

Enterprise procurement platform

4 weeks

Aerospace technical documentation model

Fine-tuning on ISO 17100-certified translation pairs. Evaluated by subject-matter expert before deployment; passed first human review with 12% edit distance vs. 38% baseline.

Aerospace component supplier

2 weeks

App store copy model, 6 locales

Lightweight per-locale fine-tuning trained on three years of approved store descriptions. Output requires minimal editing for the stable marketing format.

Mobile app publisher

Full shipping history →

// work with us

Tell us what you are building.

We respond within one business day. If the project is a good fit, we will schedule a short call to understand the scope before proposing anything.

Get in touch