Screenshot to translated draft in five days
A procurement platform needed to add image-to-translation to their localization pipeline. Five days from request to production: OCR, context extraction, RAG-injected LLM translation aligned to their termbase.
The challenge
The client’s localization team was spending significant time translating UI screenshots. Developers and product managers would capture screenshots of new interface states and submit them to the translation team for reference, so translators could see the context around a string, not just the string itself. The team was manually re-translating the visible text in those screenshots rather than using the existing approved translations in the TM.
The request came in on a Monday: could we add image-to-translation to the pipeline? Drop in a screenshot, get a contextualized draft back: text extracted, recognized as UI strings, matched against the TM and termbase, and returned as a structured translation draft. The PM needed a working answer before end of week.
The constraint was not technical: it was terminological. The client had spent three years building a termbase with 840 approved term pairs across five languages. Any translation produced from screenshots had to be consistent with those approved terms. A generic OCR-to-MT pipeline would not be.
Our approach
We broke the problem into three sequential steps, each of which had to be reliable enough that the next step could depend on it.
Step one: extraction. OCR on UI screenshots is noisier than OCR on documents. UI text includes labels, tooltips, button text, truncated strings, placeholder text, and text rendered in custom fonts at small sizes. We tested two OCR approaches and selected the one that performed better on the client’s specific screenshot format, then added a post-processing step to segment the extracted text into discrete strings.
Step two: contextual recognition. Extracted strings from a screenshot are not necessarily in the TM in the exact form they appear in the screenshot. A button label that reads “Submit order” might be stored in the TM as “Submit order for processing” or truncated by the UI. We used semantic matching against the TM (embedding-based, not string-based) to find the nearest approved segment for each extracted string. When a confident match was found, the approved translation was surfaced directly. When the match was below threshold, the string was routed to LLM translation with termbase injection.
Step three: output assembly. The translated strings were assembled into a structured draft aligned to the screenshot layout, so the translator receiving the draft could see the original screenshot alongside the proposed translations, in position.
What we shipped
A production pipeline accessible via a simple drag-and-drop interface: drop a screenshot (PNG or JPEG), receive a structured translation draft within 30 seconds.
The draft format: a table with the extracted string, the matched TM segment (if any, with match score), and the proposed translation. Strings with high-confidence TM matches were flagged as “approved pending review.” Strings translated by the LLM were flagged as “requires review.” The translator could accept or edit each string individually and export the reviewed batch back to the TM.
The pipeline handled all five of the client’s active target languages in the same request. A screenshot of the procurement UI returned drafts in German, French, Spanish, Japanese, and Korean simultaneously.
Total development time: five days. The client’s PM was on the call when we demonstrated the working pipeline on day four. We spent day five on documentation and handover.
// outcome
- Deployed to production on day five: no second sprint, no scope revision
- OCR accuracy on UI screenshots: 96% on clean renders, 88% on compressed JPEGs, within acceptable range for the use case
- Translated drafts aligned to approved termbase on 97% of extracted strings
- Reduced manual re-translation of screenshot-based content by approximately 80% in the first month of production use
Have a similar problem?
Tell us what you are trying to build. We will tell you honestly whether we can help and how long it would take.
Get in touch