January 22, 2015
Machine Translation– Friend or Foe?
Machine Translation (MT) is a major force to be reckoned with in the field of technical communication today. Whereas the human translator works for a salary and at their own steady pace, Machine Translation gets the job done for next to nothing and with unprecedented expediency. It’s no wonder so many are looking to Machine Translation to cut costs. However, despite such savings there exists a trade off when it comes to the quality of the end product. For documentation departments, or other purchasers of technical translations, some insight into Machine Translation may be valuable. What follows is a simple primer on the basics.
As Machine Translation has migrated out of science labs and into commercial use, the reaction from translation agencies has been mixed. Some have adopted it in an effort to offer customers attractive and competitive fees. Others have resisted, arguing that Machine Translation can never replace human translation—and correctly so. A machine-translated text is only but an approximation of a translation. In most cases, post-editing by a human eye is required in order to produce a text of commercially acceptable quality. Nevertheless, Machine Translation can still be used as a complimentary tool in the translation process, and even as a sort of “gist-ing engine” to gather the gist of a text in a foreign language.
The Statistical MT Headache
In our field of technical communication, the most widely adopted approach to MT is Statistical Machine Translation. In Statistical Machine Translation, statistical probabilities are used to generate translations. First, a bilingual corpora (a collection of texts in two languages) is aligned, usually on sentence or paragraph level. The alignment results are usually automatically evaluated against a “gold standard”; a pre-defined, manually created reference alignment representing the desired result.
If good enough, the result of the alignment is used to train a language model with probabilities; estimating the most likely translation of a sentence from source language X to target language Y. When translating a text with MT software, this language model selects “the best” translation. Note that one language model has to be created for each language pair: Swedish -> English, English -> French, and so on.
However, there are several problems with statistical Machine Translation. Errors may be introduced via statistical anomalies; a single sentence in a source language might translate as several sentences in the target language; the translation is a literal approximation (idioms and figures of speech are hard to handle); and, perhaps the biggest problem is semantic ambiguity— where the meaning of words are dependent upon the context in which they are used (most of us have likely laughed at poorly machine translated-texts where “the wrong” synonym was chosen).
Your MT Quality Control Check-List
In light of such risks it would be wise to take stock of the situation. Does your translation supplier utilize machine translation? If so, it might be a good idea to get more involved and discuss best practices— what is acceptable quality and how can the best possible quality be achieved? Here are some key aspects to consider:
Quality of source language. Is your company’s technical language sufficiently controlled? In other words: is it “boring,” with little variation and many occurences of the same terms and phrases? A concise, non-ambiguous language is a prerequisite for high quality MT results. This can be achieved with terminology work, synonym reduction, and writing rules. This means you have to spend time and money, and allocate resources. If you put the work in additional benefits, besides better MT results, include more readable texts and lowered (long-term) translation costs.
Domain. The language model should be trained on a corpora that is representative of your technical documentation in order to reduce semantic ambiguity (as discussed above). To build a language model for automotive texts, for example, the corpora should ideally be derived from the same domain. Furthermore, it must be sufficiently large enough to produce reliable statistics.
Post-editing effort. Various metrics are used to evaluate machine translation performance. One is the “edit distance,” the number of manual edits required to make the output result correct, fluent, and close enough to the gold standard. Based on the edit distance, a decision can be made whether the result is good enough to be commercially used. The effort of post-editing and correcting machine translations in order to produce an acceptable translation should be minor compared to translating from scratch; otherwise you’re barking up the wrong tree.
Priority. If your company uses a translation memory (as in Skribenta 4, for example), the perfect or “fuzzy” matches it produces are likely of better quality result than any MT result, and are to be prioritized before you venture into machine translation.
So, is Machine Translation your friend or foe? If you tick all the boxes in the list above, MT can be extremely friendly— particularly if applied as a complimentary tool. Hopefully this little summary has given you some insight into the world of Machine Translation. At the very least, you can now laugh knowingly instead of ignorantly the next time someone jokes about a particulary bad MT result.
Good read? There's more where that came from! Browse our blog archives for more articles by Peter and our other technical communication experts.
- Info Tech Trends