Skip to main content

Introduction

Today, texts are created in a wide variety of file formats. However, not every file format is equally suitable for translation. The choice of file format for the translation offers further potential for reducing translation costs. In this context, you should also consider the frequency with which certain document types are translated, the number of target languages, the intended purpose (internal vs. external communication), and the layout of the translated documents.

PDF documents

PDF is one of the most frequently used formats in translation. Even though nowadays most translation tools can process PDF files, editing often involves a lot of effort and the result is unsatisfactory. Scanned PDF files with handwritten notes or comments pose a special challenge.

Each PDF file is converted into an editable format (usually Word or text) when imported into a translation tool in the background, since the text in the PDF file cannot be edited directly. Depending on the origin of the PDF file and the quality of the converter used, the results are more or less good. Formatting is often lost, sometimes the context between text parts is lost, headers and footers are imported several times, etc. This means more text is translated (and paid for). The layout of the translated target document must subsequently be corrected, and then it is converted back into the PDF format. In most cases, additional work is required for file preparation and processing. 

Microsoft Office file formats

Word, Excel and PowerPoint are still the most frequently translated file formats. However, the many processing options and the different ways in which MS Office products are used mean that Office formats often cause problems during translation and are not always processed correctly and automatically.

Should comments in Word or Excel documents, or speaker notes in PowerPoint presentations, be translated? What should be done with as yet unaccepted changes in Word documents? How to use embedded content and objects? Can files with macros be translated? Manual, often inconsistent formatting means that the layout and translation have to be edited. Mixed-language documents are common with Office documents. "Please only translate the English text" or "Please only translate the passages marked in yellow" – simple instructions from the author's point of view. How should the translation tools of the language service provider automatically recognize these passages and block the remaining passages for translation when, for example, the yellow markings are not precise and letters and words are missing? These types of documents can be wrongly prepared and incorrectly calculated because translation service providers use translation tools to automatically analyze texts and create quotes. If you notice this at the beginning of a project, the project manager can intervene in the translation process at the translation service provider and prepare the document. This is a manual job that is now increasingly being billed by service providers as project management, file preparation or engineering because of the ever-smaller margins on word prices.

Example of MS Word document with comments and tracked changes

Another problem with Office documents lies in the version differences, e.g. between DOC and DOCX files. New features of Word for example are often not supported by translation tools at the beginning. So certain parts of a text are not translated. Sometimes documents cannot be saved in the original file format after translation, or translated Office documents cannot be opened anymore. One solution could be to have defined standards such as format templates and guidelines for creating Office documents that are to be translated. Unfortunately, the reality in many companies is often different. For this reason, Office documents are only recommended as a format for translation in some cases.

Pictures, graphics and drawings

Pictures, graphics and drawings often contain translatable text. They can also be problematic during translation as the text often cannot be edited directly. Many translators or translation service providers do not have the necessary editing software or know-how. Whether JPG, PNG or GIF files, or embedded in Word, Excel, PowerPoint or other documents, translation tools cannot recognize the text for translation and therefore do not take it into account when counting words. If the project manager or translator does not notice that the text in the images has to be translated, this will have a negative impact for the customer. The missing texts must then be typed out and translated separately afterwards. If the original open, editable file formats (Adobe Photoshop, Illustrator, etc.) are missing, you often have to integrate the translations into the images manually, which takes a lot of effort. The result: Delays due to manual work preparing and processing files as well as foreign language typesetting. The solution: If possible, provide open file formats for translations in which texts can be edited directly. For file formats such as PSD from Adobe Photoshop, AI from Adobe Illustrator or DXF/DWG from AutoCad, there are solutions that allow you to export and import text for translation, so you can send the plain text plus an image for context as a job.

Example of an Adobe Photoshop document

Software, Websites and Databases

Microsoft Excel

Whether for translating software, websites or databases, developers love Microsoft Excelfiles! They think they are doing the translator a favor. They assume anyone can use Excel. No matter the origin of a text that is to be translated, it is often converted into Excel format and then back into the original format after the translation. Today, most translation tools can process software or database files directly. The structure of the Excel files often requires file preparation and processing before translating, e.g. hiding columns or rows, locking texts or placeholders, etc. Without file preparation and processing, there is a risk that non-translatable texts (e.g. software IDs) will be translated and invoiced. This can later lead to problems when importing into the software or database, for example.

Mark-up formats: HTML and XML

Texts in formats such as HTML and XML are characterized by <tags>. HTML is a standardized file format that is supported today by all common translation tools. The HTML tags give the translator information about what kind of text is being translated.

<h1> = heading 1, <p> = paragraph, <a> = hyperlink, etc.

While the marking of HTML documents is based on the type of document structure or later output, the marking of XML documents is based on the type of content or intended use for marking a version number.

<version>1.2</version>

The author is free to choose the elements of an XML document. So there is an infinite set of XML formats. Just like HTML, XML is also supported by all common translation tools. However, for each XML format it must be specified which parts of the XML document have to be translated and which not. However, it is often difficult for translation service providers to decide. This decision can be aided by marking non-translatable content in the form of elements, such as <notranslate>, or attributes, <version translate="“no“">. Another way of using XML for translation is to use metadata, such as length restrictions, comments for translators, etc. This is why XML is a popular format for translating software texts and content from database-driven CMS, PIM, catalog systems, etc. Stylesheets can be used to display a preview of the document in its original format while the translator is working, so that the visual context can also be taken into account. XML is therefore a format that is well suited for translation due to its structure.

 

Example of Magento XML with markups for non-translatable content

XLIFF: a standardized exchange format for translation

XLIFF (XML Localization Interchange File Format) was developed as a response to the ever-increasing number of partly proprietary document formats for exchanges with translation tools. XLIFF is a flexible, expandable standard. XLIFF is the internal working format in most translation tools today. It works bilingually and therefore allows a direct comparison between the source text and the translation. Ideally, a XLIFF document does not require any preparation and can be edited immediately in the translation tool. Nowadays, many manufacturers of software, CMS, etc. offer XLIFF as a format for translation. However, the XLIFF standard is not always implemented correctly. Seeing XLIFF does not always mean it is a XLIFF.

Savings potential: medium to high

Method: Defining open files as part of the scope of delivery and sending them to be translated, converting files, and using standardized exchange formats for the translation. Talk to your language service provider before thinking about the how’s and if’s of your project. Your agency often already has a solution that has proven itself in other projects.

Effort: low

Now you know how important it is to choose the right format when creating texts for translation. In the next part of our blog series, you will learn how to reduce translation costs by using the right translation technology.

Links