099 NSenthilraja Kanimozi Unicode Final
099 NSenthilraja Kanimozi Unicode Final
Abstract
1. Introduction
TACE16 is a 16 bit character encoding technique where all Tamil characters can be
represented through a single character. There is no specific Tamil Compiler available to fulfill
all these necessities as of today. We have been enthroned to design a Tamil Compiler with
unique codes for all the Tamil characters and to design a compiler for Tamil which will be
used for executing Tamil programs. There are 5 phases in the Tamil Kanimozhi design. They
are Lexical Analyzer, Syntax Analyzer, Semantic Analyzer, Code Generator and Optimizer. As
a part of the Tamil compiler design, we have completed the design of the Lexical analyzer
and achieved good results.
Many solutions were suggested earlier for Programming languages in Tamil. We will review
two of the proposed solutions Swaram and Ezhil.
Swaram a static typed Tamil Programming language was introduced first for Tamil language
in 2003. It has feature set related to C Programming language. Swaram allows mixed English
and Tamil identifiers so that it can access external libraries in English. Keywords will be in
tamil. Swaram is not publicly available, which severly limits language development, system
use, community support and improvement.
Ezhil also a static typed Tamil Programming language was introduced in 2008. Ezhil
incorporates most of the concepts from Swaram.
Both these Programmin Languages uses Unicode Tamil. Unicode Tamil is a 8 bit encoding
which requires multiple code points for the most used characters. Some of the limitations of
Unicode Tamil are storage size doubles, Security vulnerabilities, Ambiguous
combinations(requires normalization) and Simple counting, sorting, searching inefficient.
3. TACE16 Encoding
Tamil All Character Encoding(TACE16) is a 16‐bit unicode based character encoding scheme
for Tamil language. TACE16 character encoding scheme not only overcomes all the issues
with the present Unicode encoding standard for Tamil language which are mentioned
above, but also provides additional advantage over major performance improvements in
both processing time and processing space which are the major factors in affecting the
efficient and speedy execution of any computer based program. It uses Tamil99 and Tamil
Typewriter keyboard layouts, which are approved by Tamil Nadu Government, and maps
the input keystrokes to its corresponding characters of TACE16 scheme.
Kanimozhi program is completely written in Tamiil language. All the keywords are
represented in Tamil language. It uses all the existing operators in current programming
languages. Some of the keywords and the equivalent tamil version are shown in the table
below.
The phase diagram of Kanimozhi is as follows:
i) Scanner
The Lexical Analyzer is the first phase of Kanimozhi. It begins the analysis of the
source program by reading the input, character by character, and grouping
characters into individual words and symbols (tokens). We have completed the
design of Lexical Analyzer.
For example consider the following statement.
When the above statetment is passed to Lexical Analyzer, it splits the statement
into tokens.
ii) Parser
v) Code Generator
Conclusion
Kanimozhi will be useful for most of people who are expert in Tamil and for rural tamil
people but could not use Computers to the core because of the language gap. They need a
Programming language to use the Computer in an easy and efficient way. The Research is to
design and develop Kanimozhi as a full fledged Tamil Compiler for Tamil people intending to
use Compiler efficiently. Our goal is to spread the computer functionalities to all the people
who are not bound to English language which in turn will help to develop Tamil language.
References
[1] S.G. Ganesh, G.R. Prakash and K.K. Ravi Kumar, An overview of 'Swaram' : A
programming language in Tamil, Proceedings of Tamil Internet Conference, 2003
[2]Mutthaiya Annamalai, (2008) Ezhil Project:
http://students.uta.edu/mx/mxa6471/download.html.
[3] Laurel Peterson, (1999) Principles of Compiler design, Pearson Publication.
[4] Dr. M. Ponnavaikko, Mani M. Manivannan and Manoj Annadurai, Review of Tamil
Unicode, http://www.unicode.org/L2/L2007/07193‐tamil‐pres‐2.pdf