Prompt engineering

The practice of designing, testing, and refining the instructions given to a language model to produce consistently useful output, including specifying task, context, constraints, format, and edge case handling.

The honest version

Prompt engineering is real craft. The gap between a poorly written prompt and a well-written one can produce a quality difference comparable to the gap between a bad translator and a good one. The model does not change; what you tell it changes.

The work is not mystical. It involves writing instructions clearly, providing context the model cannot infer on its own, constraining the output format, specifying what to do in ambiguous cases, and testing systematically rather than iterating by impression. A prompt that “seems to work” for three test cases may fail on the fourth in ways you would have caught if you had tested more rigorously.

In translation and localization contexts, prompt engineering is the primary control mechanism when you are not retraining or fine-tuning the model. The prompt is where you tell the model which locale it is targeting (not just which language), what the domain is, what register is expected, which terms are approved, which are forbidden, and what to do when the source text is ambiguous.

Why it matters for translation

“Just use ChatGPT to translate it” fails for professional localization not because the model cannot translate, but because no one told it what done looks like.

A production translation prompt specifies, at minimum: source and target locale (not just language), domain and product context, register and formality level, termbase entries for any terms in scope, TM segments retrieved for the specific source string, output format requirements (is HTML allowed? what about line breaks?), and instructions for handling untranslatable elements: brand names, UI labels, measurement units.

Each of those specifications requires a decision. The decisions require domain knowledge. Encoding them correctly into a prompt requires linguistic knowledge. The result (a well-engineered translation prompt) is an artifact that reflects accumulated expertise. It is also brittle in ways that the next section covers.

Well-engineered prompts also enable consistent evaluation. If the prompt is explicit about what the output should be, you can build automated checks: does the output preserve the tags? Does it use the approved term? Is it within the maximum character count? Prompt engineering and QA automation are inseparable.

Where it fails

Prompts are brittle. A prompt that works reliably for English to German may fail silently for English to Japanese, because the model’s internal representation of Japanese localization requirements differs from its representation of German ones in ways the prompt does not compensate for. You discover this in evaluation, not in design.

The model’s behavior is not deterministic. The same prompt produces variance across calls, across model versions, and across temperature settings. A prompt that you evaluated at temperature 0.3 behaves differently at temperature 0.7. Model updates (even minor ones) can shift behavior in ways that invalidate a prompt you evaluated carefully on a previous version.

Prompts are not versioned by default. If you change the prompt and do not track the change, you cannot attribute an output quality shift to the prompt change vs. any other variable. This is basic software engineering applied to an artifact that most teams do not treat as code, and should.

Length has diminishing returns and can reverse. Prompts that specify every possible case in detail can confuse models that perform better with clear, concise instructions. Finding the right level of specificity requires testing, not theory.

Finally: prompt engineering reaches its limits when the model does not have the domain knowledge the prompt assumes. You can tell a model that your product’s top-level organizational unit is called a “workspace” and should be translated as “Arbeitsbereich” in German, but if the model does not understand the semantic field of that term well enough to apply it correctly in novel contexts, the instruction fails in edge cases. Prompting is a communication problem between human expertise and model capability; it cannot supply expertise the model does not have.

Prompt engineering

The honest version

Why it matters for translation

Where it fails

Related terms