Supporting Digital Content in Indian Languages

Language is a means for communication. A wider language support ensures a wider audience outreach. India has 22 official languages, yet only English(which 80% of the Indian population are not fluent in), dominates the Indian online space. Users are expected to learn English to use digital platforms, instead of providing content to users in a language they are comfortable in. From the deep technical discussions in Tamil in a software company in Chennai to the Microwave with all knobs in Finnish in Finland, I have experienced and realized the immense benefits of localizing content for user comfort. More than a decade ago, I worked in the i18n* of a software. This software underwent l10n* in Japanese and was tested rigorously to ensure quality control for ease of Japanese users. With regard to localization support here in India, we are a bit late in the game. Even today, only a few players provide localized content. But some organizations are now realizing the economic benefits of localizing content as they eye markets in tier-2 and tier-3 cities. Amazon recently launched a Hindi version of it’s offerings.

* So what exactly is i18n and l10n?

We hear these two terms frequently with regard to providing local language support. i18n is a numeronym that stands for internationalization. The number 18 denotes the number of letters between the first(‘i’) and last letter(‘n’). l10n is also a numeronym that stands for localization, the number 10 denotes the number of letters between the first(‘l’) and last letter(‘n’).

i18n is the process of designing and developing software in such a way that it can be translated or localized easily for any other language.

l10n is the actual customization of the i18n software for a specific language target audience via translation.

At first it may sound confusing, but the following figure based on a chart from the LISA website, depicts the difference between the internationalization and localization process a little more clearly.

Localisation and Internationalisation

The w3c website provides more clarity.

Translation

There are two ways to translate content – Manual Translation and Automated Machine Translation.

Manual translation, as implied is translation via human intervention.
Let us look at how machines perform translation. For convenience, let us see how one of the most popular machine translation providers – Google Translate works. Google translate initially used the Phrase Based Translation model which translated a sentence word by word. The newer model is based on Neural Machine Translation(NMT) which uses Deep learning to build an artificial neural network to teach itself translation. In simpler words, it translates sentences as a whole instead of word by word translation thus making the translation more context aware and hence more accurate and meaningful.

Pros of Machine Translation

a. Super Quick
Human translators can never compete with the speed and processing power of machines, which translate big blocks of data within a few seconds.

b. Cost Effective
Google Translate being a free service, saves you the translation fee charged by professional translators.

Cons of Machine Translation

a. Dynamic Translation via APIs
If your app or program uses Google Translation APIs for dynamic translation service, the charges get expensive over time.

b. Literal translation
Certain words like Proper Nouns should not be translated. In numerous instances, I found a lot names of places also getting translated.

c. Double meaning
There are certain words that have multiple meanings. A lot of times, such words are incorrectly translated.

d. Spelling mistakes
Based on my experience, I found a lot of words were spelt incorrectly in local Indian languages. The mistakes were not mere typos, but varied significantly from the original content. Running reverse translate led to utter chaos.

e. Complex sentences
While simple sentences were correctly interpreted, lengthier sentences were reordered and translated incorrectly.

f. Indian Language support
The Google translation service for western languages is much better compared to Indian languages. This analysis is based on my personal experience using the translation service in Hindi, Kannada and Marathi, as the end result needed a lot of correction.

A study that evaluates the use of Google Translate in medical communication, made the following conclusion.
Google Translate has only 57.7% accuracy when used for medical phrase translations and should not be trusted for important medical communications.

There are skeptics who claim that automated decoding of phrases will never be perfect as they lack understanding, emotions and imagination. Weighing the Pros and Cons of Machine Translation, a recommended approach is to use a hybrid option. Automated translation can be used as a baseline, with manual correction of mistakes. Some people predict that the current human translators will play the role of quality control for machine translations until the automated translations reach the level of human perfection.

Digital Trends in India

A study done by KPMG states that ‘Digital’s future lies in the Indian language internet users’

The following figure from the KPMG study displays a Gamut of factors driving the adoption of Indian languages on Digital Platforms:

Digital Trends in India

FICCI-ILIA

Since providing content in Indian languages is a complex exercise, it needs collaboration between various stakeholders including government, educational institutions, private organizations etc. FICCI (Federation of Indian Chambers of Commerce and Industry) in association with the Ministry of Electronics & Information Technology (MeitY), Government of India, launched the ‘FICCI-Indian Language Internet Alliance (FICCI-ILIA)‘. The objective of the alliance is to boost internet penetration in the country by enabling greater access of regional language content on the internet

The Road ahead

The content above discusses translation service in terms of text only. It is not only important to create localized digital content, we will need to even look at supporting Indian languages via language friendly keyboards. And the next step from there would be to explore translation in the area of voice and speech. There are software that provide Speech to Text and Text to Speech services. While there are many big vendors providing proprietary solutions, standardization and open source solutions will drive innovation and hence help users immensely.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s