Preetpal Kaur Buttar and Jaswinder Kaur
Sant Longowal Institute of Engineering and Technology,India
A large amount of data in Indian languages stored digitally is in ASCII-based font formats. ASCII has 128 character-set, therefore it is unable to represent all the characters necessary to deal with the variety of scripts available worldwide. Moreover, these ASCII-based fonts are not based on a single standard mapping between the character-codes and the individual characters, for a particular Indian script, unlike the English language fonts based on the standard ASCII mapping. Therefore, it is required that the fonts for a particular script must be available on the system to accurately represent the data in that script. Also, the conversion of data in one font into another is a difficult task. The non-standard ASCII-based fonts also pose problems in performing search on texts in Indian languages available over web. There are 25 official languages in India, and the amount of digital text available in ASCII-based fonts is much larger than the text available in the standard ISCII (Indian Script Code for Information Interchange) or Unicode formats.This paper discusses the work done in the field of font-detection (to identify the font of the given text) and font-converters (to convert the ASCII-format text into the corresponding Unicode text).
Language Detection, Font Detection, Font Conversion, Unicode