A translator guide to website translation
Since the publication of this post, I have released Resx Editor a free visual resource editor dedicated to translation works.
In this post, I give a short introduction about website translation. The targeted audience is non-technical translators. I will focus on the particular case of website translation when relying on Microsoft XML Resource files.
The big picture
Dynamic websites include many things beside pure textual content (programming source code, images, stylesheets, …). In order to simplify the job of the translators, all the textual content can be isolated into resource files. The main idea behind resource files is to replace every textual item of the website by a resource idenfier. Intuitively, instead of having a webpage containing the text Hello World!, you have reference HelloWorld and multiple resource files. The English resource file contains HelloWorld="Hello World!", the French resource file contains HelloWorld="Bonjour tout le monde!", etc. By choosing the right resource file, the website appears in the corresponding language.
Basic concepts
- 
identifier: a unique key associated to a textual item. 
- 
(localized) resource: the expression (the content) of a textual item expressed in a particular language. 
- 
(localized) resource file: a file containing a list of pairs identifier+resource. 
Microsoft XML Resource Files
It exists many resource file formats, but I going to discuss the Microsoft XML Resource file format (RESX in short). This resource file is a XML format. Without digging into XML standard, it simply means that the content of the file look like
<?xml version="1.0" encoding="utf-8"?>  
<root>  
<data name="HelloWord" >  
    <value>Hello World!</value>  
  </data>  
</root>
As you can see, the identifier is specified through a XML attribute (that’s the terminology for the syntax somekey="MyKeyHere"). The resource is specified with a <value>My resource here</value>. Resource files are much more structured than classical, human readeable documents. Indeed, the webserver needs to be able to perform an exact matching between identifiers and the associated resources. Therefore, as a translator, you will have to be very careful when editing a resource file. You should not touch the XML markup, otherwise the resource file won’t be readeable any more by the webserver. The only section that you can modify is what lies between the <value /> tags.
A more complete sample of RESX file:
<?xml version="1.0" encoding="utf-8"?>  
<root>  
<data name="HelloWord" >  
    <value>Hello World!</value>  
  </data>  
</root>  
<data name="GoodBye" >  
    <value>Goodbye!</value>  
  </data>  
</root>  
<data name="Thanks" >  
    <value>Thank you very much for reading this post!</value>  
  </data>  
</root>
A bit of help from the web designers
Translating a website usually involves translating many small keywords like to, at, by, new, view. Those short English words are quite ambiguous. In order to simplify the translator life, a good website designer will include some additional indications within the resource file to facilitate the translation work. For this purpose, the RESX format includes an optional <comment /> tag. The previous XML sample can be modified in order to include a comment.
<?xml version="1.0" encoding="utf-8"?>  
<root>  
<data name="HelloWord" >  
    <value>Hello World!</value>  
  </data>  
  <comment>Don't forget to include the punctuation.</comment>  
</root>
Do not translate those comments, you will be wasting your time. Those comments have just been included to make your life easier. Those comments are totally ignored by the webserver, their content will never appear on the website.
A bit of help from Notepad++
XML files are just plain text files (as opposed to rich text files such as Microsoft Word), yet due to the very sensitive nature of the XML markup (deleting a single > breaks the XML structure), you should better rely on dedicated tools to edit/modify RESX files. My personal suggestion is to use Notepad++, a very robust text editor that can handle XML files. Notepad++ is open source (you can download it and use it for free, even for commercial purposes).
Tip: Notepad++ does not immediately recognize RESX files as XML files. When you open a RESX file with Notepad++ go to LanguageâXML to select XML as the file language. You will benefit of a much cleaner view of the RESX file.
Top translation mistakes
Website translation is a job of precision. I am listing below a few probable errors that the unaware website translator might commit.
- Spacing: "bonjour"is not the same as" bonjour"(notice the initial space).
- Capitalization: "Delete"is not the same as"delete".
- Punctuation: "Terminated."is not the same as"Terminated"(dummy parenthesis to keep the dot away).
- HTML markup (caution, tricky): RESX file can contain HTML markup, but the symbols < and > are going to be encoded. The sign ‘<’ (resp. ‘>') with appear encoded as ‘<’ (resp. ‘>'). Do not touch the encoded HTML markup.
- Weird symbols (tricky again): typically if you encounter something like Dear M. {0}the{0}is a substitute, (in present case, it’s certainly a substitute for a user name). Do not touch any substitute.
Reader Comments (11)
Good short intro to the problems of translating websites. Of course you know there are a host of other and could become a nightmare, especially if you mess a tag unknowingly. There is a better solution for translators that make the job of website translation much easier. Would be interested in seeing your reaction about it at the above address. We still have a few missing pieces, but for the most part, it is a complete online solution with NO TECHNOLOGY HASSLES for translators! Thanks in advance for your comments! Zai
November 24, 2006 | Zai Sarkar
I just had a 10min look to BabelX.com. Your design approach, compared to PeopleWords.com, is basically almost the total opposite; and I am really not convinced. Basically, after 10min, I am still not having any clear view of what you are delivering. Tiny piece of information seems to be lost within marketing superlative (ex: +1000% ROI). Remember: I am an IT specialist, thus what are you exactly expecting from the average web user? Avoiding technology hassles starts with avoiding information pollution. Do you think that something as simple as translation services require a front web page with almost 100 links? In comparison, PeopleWords front page has only 15 links (not counting language links, and it’s already almost too much).
November 24, 2006 | joannes
In our projects it’s always a big hassle when to contact general language translators to translate some RESX files for us. We need to pend a lot of time to bring it all to a working condition. There are some tools out there, which are intended to make this terrible job easier, but it’s very far from perfect. Language translations in the software business remains unfortunately a big hassle :(
May 29, 2008 | Transliteration
