IIT Madras develops easy OCR system for reading Bharti Script

Date: 29 April 2019 Tags: IT, Mobile & Computers

Reaserchers from IIT Madras have developed easy multi-lingual optical character recognition (OCR) schemes for reading documents in Bharati script. They also have developed universal finger-spelling language (method) for the nine Indian languages It was developed in collaboration of TCS, Mumbai and can be used to generate sign language for hearing-impaired persons

About Bharati script

  • It is unified script for nine Indian languages which is being proposed as a common script for India to bring down many communication barriers.
  • The nine languages are Devnagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, Malayalam and Tamil. English and Urdu have not yet been integrated as they have a very different phonetic organisation.
  • It was developed by team of researchers from IIT-Madras headed by Professor V. Srinivasa Chakravarthy.

OCR schemes for Bharati script

It involve first separating (or segmenting) document into text and non-text. The text is then segmented into paragraphs, sentences words and letters.  Each letter is then recognised as  character in some recognisable format such as Unicode or ASCII. The letter has various components such as basic consonant, consonant modifiers, vowels etc.

