June 23, 2011

Google Translate support five new Indian langauges

In one of its major strides towards reaching out to the local community, search engine giant Google announces its translation services for five Indian languages namely Bengali, Gujarati, Kannada, Tamil and Telugu. With the inclusion of new languages, the total number of languages supported by Google Translate has risen to 63. Google says these languages are presently in experimental phase, noting that Indian languages are pretty different from English language. It also highlights that the new languages supported by its online translation service are spoken by over 500 million people in India and Bangladesh.

Google's research scientist Ashish Venugopal writes at Google Blog, “Indic languages differ from English in many ways, presenting several exciting challenges when developing their respective translation systems. Indian languages often use the Subject Object Verb (SOV) ordering to form sentences, unlike English, which uses Subject Verb Object (SVO) ordering. This difference in sentence structure makes it harder to produce fluent translations; the more words that need to be reordered, the more chance there is to make mistakes when moving them”.

“Tamil, Telugu and Kannada are also highly agglutinative, meaning a single word often includes affixes that represent additional meaning, like tense or number. Fortunately, our research to improve Japanese (an SOV language) translation helped us with the word order challenge, while our work translating languages like German, Turkish and Russian provided insight into the agglutination problem,” he adds.

Mr Venugopal also points out that these new alpha languages are likely to be less fluent and may have several untranslated words as compared to languages such as Spanish and Chinese - which he says have much more of the web content that powers Google's statistical machine translation approach. He concludes his post hoping the new alpha languages will help users have better understanding of the Indic web and encourage publication of new content in Indic languages.

No comments:

Post a Comment