Re: compiler for Chinese development language

"Abhishek Choudhary" <choudhary@indicybers.net>
12 Jan 2006 12:14:52 -0500

          From comp.compilers

Related articles
[23 earlier articles]
Re: compiler for Chinese development language nmh@t3x.org (Nils M Holm) (2005-10-26)
Re: compiler for Chinese development language owong@castortech.com (Oliver Wong) (2005-10-26)
Re: compiler for Chinese development language owong@castortech.com (Oliver Wong) (2005-10-26)
Re: compiler for Chinese development language henry@spsystems.net (2005-10-27)
Re: compiler for Chinese development language henry@spsystems.net (2005-10-27)
Re: compiler for Chinese development language gah@ugcs.caltech.edu (glen herrmannsfeldt) (2005-10-28)
Re: compiler for Chinese development language choudhary@indicybers.net (Abhishek Choudhary) (2006-01-12)
| List of all articles for this month |

From: "Abhishek Choudhary" <choudhary@indicybers.net>
Newsgroups: comp.compilers
Date: 12 Jan 2006 12:14:52 -0500
Organization: Compilers Central
References: 05-10-085
Keywords: i18n
Posted-Date: 12 Jan 2006 12:14:52 EST

Hello,


I have successfully developed equivalents of C, C++, lex, yacc, Java, BASIC
etc. for Indian languages. I am attaching a few messages where these have
been discussed.


Regards,
Abhishek
--------------------------------------------------------


======== Mail requesting support for vernacular programming languages
==========
Hi,


I am writing this message to seek help regarding an initiative for the true
and total empowerment of vernacular medium students in India in the field
of Information Technology. This relates to the development of vernacular
free open source OS (DOS in Hindi, Bangla, Gujrati etc.) and programming
languages (C, C++, lex, yacc, assembly, Java, Prolog etc. in Hindi, Bangla,
Gujrati etc.). This is the first time ever that such software has been
successfully developed, even though there have been many research groups
working on it for decades now. What is more important is that all this
software has been released as 'free' for the benefit of the vernacular
medium students. I have no financial interests from it. I have attached the
original announcement with this message, along with another message
discussing some important topics regarding feasibility issues concerning
such effort.
Group http://groups.yahoo.com/group/hindawi/


The kind of support I expect may be in any form (coding, awareness,
sponsorship, advertisements etc.). There is one specific manner in which
one can contribute greatly, with or without involving me as outlined below.
1) Adopt the vernacular IT initiative for a school or college in one's own
hometown.
2) Arrange for a few computers (even old second hand one's can support the
software I have developed, so you may donate old ones). This software "does
not" require any add on hardware or commercial software for supporting
Indian vernacular display, OS and programming languages. You will have to
contact the school or some concerned organisation for the actual transfer.


As about me, I am a 24 yrs old IT professional and the winner of the
Computer Society of India's Eastern Regional Young IT Professional Award
2005 (national rounds are yet to be held, but I'm sure I'll win.)


I would finally like to apologise if you find this email unsolicited, and
thank you for reading this far. This has been sent to you individually and
your name has not been included in any mailing list. This is the only mail
you shall receive, unless you wish to further communicate regarding this
matter.


Regards,
Abhishek Choudhary
K-6, Tollygunge Police Lines,
Kolkata -- 700 033
West Bengal
India


Mobile: +91-9831369549
Fax: +91-33-24221175
email: choudhary@indicybers.net


======== Mail announcing the availability of FOSS system software in
vernaculars (Hindi/Bangla C, C++, assembly, BASIC, LOGO etc.) ==========


Respected Sir/Madam,


This is to inform you of the availability of the first ever "complete"
suite of open-source programming languages for Indian vernaculars. It
includes equivalents of C, C++, lex, yacc, assembly, Java etc. in Hindi,
Bangla and other vernaculars. Along with this I have also released Hindi
and Bangla DOS, including BASIC and Logo for vernaculars. The downloads are
available at http://www.indicybers.com


These projects have won Computer Society of India's Young IT Professional
Award (Eastern region) 2005 (Winner) and 2004 (Special mention), two years
in a row. I shall be competing at the 2005 national level now.


Some of the innovations of this project include a system for displaying
Indic scripts in "true" text-mode. This is done without using any aditional
hardware. At no point has any graphical (rasterising) method been used for
this. All the required glyphs have been accomodated in the extended ASCII
code page, leaving 7-bit ASCII unaltered. This method is applicable to all
Brahmi derived composite syllabic Indian scripts. Hindi, Bangla, Assamese
and Gujrati scripts have been implemented. Oriya and Punjabi are under
development. There are strong suggestions that this may be applicable for
South Indian scripts as well. This has made it possible to have BIOS/POST
in Indic. Besides, this system being free, it does not add to the
procurement cost as compared to commercial products.


Another contribution of this project includes a "case and diacritic
independent, compiler acceptable" transliteration system. This is
completely invertible and is applicable to all Indian languages. This has
direct mapping to the IPA and, hence, may be used to develop programming
languages in "any" human language. It also has bearings on web technology,
as it can allow Indic URLs in IPv4 as well. It may be used to encode even
static web-pages, such that if someone does not have the required fonts
then one may see the Indic web-page in Roman script transliteration,
instead of "boxes" (unicode) or garbage (other encodings), from the
same "static" html.


Finally, the task of Indic programming language design has not been trivial
either. I have also included support for HP printers. The system uses GCC
as back-end and is highly portable. There is both ISCII and UNICODE support
for all languages, including Hindi/Bangla DOS and the IDE. Necessary
filters have been provided for conversions between ISCII, Romenagri,
UNICODE, APCISR, HP-PCL etc. The languages have been developed
synchronically and, hence, there is a certain level of homogenity in
keyword selection across paradigms. The programs written in Indic
programming languages are readily converted to their English equivalents
and hence may be delivered internationally. There is also support for
translation of variable names and rudimentary literate programming.
Unreleased languages include Lisp, Prolog, Ada, Pascal, Fortran etc. in
Inian vernaculars. They shall be released soon, after the initial testing
and verification of license issues. However the availability of lex and
yacc makes the issues of targeting specific lanuages quite trivial, and
these are already available for download along with C, C++, assembly,
BASIC, logo, and Java in Hindi and Bangla.


Technologically, Hindi/Bangla C/C++/assembly has been used for robotics and
cluster super-computers. Along with this system, I have also released in
public domain the design of a natural-interfaced autonomous robot. The
languages have also been used to successfully implement a Beowulf cluster.
Effort is now being made towards porting Linux kernel sources to
Hindi/Bangla C, asm etc. This is aided by the fact that I have also include
English-programming-language to Hindi/Bangla-programming-language
translators and vice-versa.


Sir/Madam, I have released all this software as free open source for the
greater benefit of the vernacular literate population of our country. I do
not have any financial interests from it. However, I shall appreciate
support in any form (including coding, awareness, maintainance, and
financial). I am also looking for a suitable job. My experience includes
systems programming (compiler design, device drivers, Linux kernel),
computational linguistics, embedded systems, C/C++/assembly/Java, medical-
informatics, artificial intelligence, and technical writing.


Link to my resume: http://indicybers.com/abhishek/


Link to my degree thesis: http://indicybers.com/hindawi/ANGELBot.pdf


Link for downloading Indic programming languages: http://www.indicybers.com


Link for downloading Indic programming languages paper:
http://indicybers.com/hindawi/Hindawi.pdf


Link for downloading Indic programming languages presentation:
http://indicybers.com/hindawi/Hindawi.ppt


Regards,
Abhishek Choudhary
K-6, Tollygunge Police Lines,
Kolkata -- 700 033
West Bengal
India


Mobile: +91-9831369549
Fax: +91-33-24221175
email: choudhary@indicybers.net


================ Mail discussing important topics (1) ======================


Hello everyone,


Thank you for your interest regarding vernacular programming languages. I
would like to begin this discussion with a few very valid points raised by
Vivek as they provided a good basis for starting a conversation, and along
these lines I shall also reflect upon some of my short and long term
objectives. The latter, though, deserve a seperate discussion and I would
request you to refer to the documents http://indicybers.com/vision2020.html
and http://indicybers.com/swatantra.html






Topic 1 : Bridging the gap beetween vernacular and English developers


Vivek wrote:
>I have followed your project for sometime now. Your project stands
>between two things. The already ready english based programming
>community and the possibly new and developing vernacular developers.
>This gap might again be challenge to bridge.


Point-> Hindawi(Hindi/Bangla C, C++ etc.) does 'not' create any further
divide or gap between vernacular developers and traditional (English)
developers. Hindi / Bangla C, C++, Java, assembly etc. get 'readily'
translated into their English equivalent. There is also provision for
translation of variable names, which certainly is not a trivial task and
formed the basis of my interaction with Swami Sarvottamanand, Dean of
Research and HOD Comp. Sci. at Belur Math college, who was one of the
distinguished judges during the YIPTA event. The reverse, that is,
translation of English programming languages and variable names to
vernacular programming languages and variable names also takes
place 'readily'; (here readily implies without any extra programming
effort). So there isn't any new gap that develops between traditional and
vernacular developers; only the old ones are bridged. I am working on a
system for complete machine translation of documentation as well (a limited
scale version should be ready in a month or two.)


Explanation-> This has been a very basic consideration during the course of
development of Hindawi. This also relates to another concern - that of the
utility of vernacular programming languages. Vernacular BASIC and LOGO are
fine as paedagogical aids (teaching tools), supposed to be used in the
classroom setting or by hobby programmers, (though I must point out that
Hindi/Bangla BASIC allows EXE files to be created and C code to be
embedded). However, the moment one talks of vernacular C, C++ , lex, yacc
or say vernacular Java, one has to consider the fact that learning these
languages shall involve a substantial effort. Though it is comparatively
simpler for a vernacular literate person than learning their English
manifestations. This shall prove meaningless if the skills acquired cannot
be used professionally, that is, if they are not marketable enough.
And "enough" certainly would include international markets. As metioned
earlier, the ready conversion of programs written in vernacular languages
to their English counterparts, and vice-versa, solves this problem. For
instance, let us consider a possible scenario set in the not-so-distant
future.


We are in a 2012AD (six years from now), the first mile-stone year
in achieving the target of making India an ICT superpower. (By then I'm
sure ICT shall stand for Indian Cottage Technology, for indeed, that is
what I envision - to make Information and Communication Technology a
literal 'cottage' industry in India which shall cater solutions globally,
and provide at least some major relief to the unemployment situation. That
would be akin to the electronics industry in China today.) A person, say,
in the USA needs a piece of software for his new startup. He logs onto the
Indian "Software Exchange" (***a new social concept***) website, posts his
requirement in the standard format available there, and pays the prescribed
fees online (say an advance, with balance to be cleared as and when the
final settlements are done). This requirements' document is in a restricted-
grammar format and can be translated into a vernacular even with 2005
technology. The vernacular requirements' document is then provided to a
vernacular developer as per turn, who may accept or refuse the task; in the
latter case, it is handed over to the next developer in queue. The
vernacular developer then proceeds with the standard software engineering
steps of analysis, coding, testing, etc. He certainly codes in vernacular
programming languages and also does the analysis and testing in vernacular.
Where input from the end user is needed in the development process, the
developer provides the English source code, which as I have pointed out is
readily generated by the Hindawi compiler system (which by then shall
certainly have improved a lot). As about the English text messages
contained in the program; they are converted back into English by the
translation process at the Software Exchange. (*** Machine translation is
the piece of technology I am focussing my development efforts on now, but
for restricted-grammar this has been achieved ***.) The end product is
finally provided to the person in the USA, with complete source, variable
names, documentation etc. in English. Further work may be handled by either
a traditional (English) or vernacular developer. As an aside, I have only
skimmed through the description of this scenario. I already have worked on
aspects such as what changes to the currently practised software metrics
may be needed. Besides, a few new postions may need to be created, and this
person-centric scenario should be viewed as a team-centric one.


Summary-> There can be complete synergy between vernacular and traditional
(English) deveoper communities.






Topic 2: Making source code available online and project management site


Vivek wrote:
>firstly, Please make your source available online. Please use a project
>management site savannah.gnu.org, sourceforge.ner or
>developer.berlios.de, gforge to name a few.


It seems you have tried the demo version. The source code, certainly, is
available online and was also carried on CHIP Sept 2005 CD (which carried
Hindawi Release 1, current version is Release2). It is included along with
the binary distribution (complete package - 40MB and not in the demo or DOS
floppy). Seperate distribution for source and binary is desirable, however,
many parts of Hindawi are written in Hindawi itself (bootstrapped) and this
makes it a more involved task as it would be necessary to provide
information regarding which file goes where and does what, and an
explanation for the code itself; but again the Hindi/Bangla sources are
converted to English equivalent, hence this is not a problem. I shall make
the sources available seperately soon. (*** Support needed ***) You may
download the following file and install Hindawi from it for the complete
distribution. (Installation instructions are given later in this mail.)
http://www.indicybers.com/HindawiR2CD.zip


As about an online project management site (sourceforge etc.). This is a
very important point, and I am considering the viable options (***Support
needed***). The plans have been there since the inception of the project,
but there have been delays. Personal obligations have further contributed
to the delay, but I have tried to keep Hindawi as much on the pre-
detremined schedule as possible. Considering the fact that,
technologically, the project is complete, it is only the lack of community
involvement that is keeping it down, and yes a portal for Hindawi on
sourceforge, savannah.gnu etc. is urgently needed. (***Support needed***)






Topic 3: Community involvement


Vivek wrote
>secondly, have been able to bring in any kind of developers( hobby, part
>time) to be involved in this process?


This is my primary objective today. Hindawi has already reached a point
where I'm finding it difficult to manage the project alone. As new
languages and technologies are added, the project will require more and
more community involvement. The immediate requirement is for documentation
(***Support needed***). Lack of proper documentation is proving to be a bit
of hindrance. For instance, many people have complained that Hindawi does
not start up on their computers. This is mainly because of poor
documentation in Release 1, which has been improved in Release 2. Further,
every system needs to evolve and that is the very essence of an open-source
system. This requires more developers to join in. At the recently concluded
Infocom 2005, Hindawi was sponsored by the National Council of Science
Museums under a technology scholarship scheme (the greatest 'de-facto'
financial help recieved by Hindawi/BangaBhasha till date). This has
provided Hindawi with a wonderful platform and many people have expressed
their intent to collaborate. However, proper collaboration would intially
require me to devote considerable time, and I have been busy with some
unavoidable engagements recently (including searching for a somewhat decent
job, which I'm yet to find) (***Support needed***). I am also looking at
the viability of writing a book (GNU FDL'd) explaining the internals of
Hindawi. But again financial constraints are the major hurdle. I would
really appreciate some suggestion regarding this matter (***Support
needed***).






Topic 4: Problems in startup


Vivek wrote:
>On the sidelines. I have not yet been successful in running it on
>windows 2000.


This stems from the fact that one normally expects Hindawi to be a GUI
based application, but one of the technological achievements of Hindawi has
been the ability to display Indic scripts in text-mode, without using any
commercial soulution such as GIST card. The feature of Hindawi which allows
it to display Indic scripts in text mode, also necessitates that it be
started in text mode. In Release1 one had to switch to text mode manually
(by pressing Alt-Enter in a DOS box, or booting up in DOS), 'before'
starting Hindawi. Even the TDIL people had problems regarding this, because
they were switching to text-mode 'after' starting Hindawi. This has been
done away with in Release 2. Now running Hindawi.bat automatically switches
to text-mode. (The exact details of running Hindawi is given later. Users
may still be required to press Alt-Enter if Hindawi starts up in a window
instead of full-screen, but the sequence is inconsequential.) I considered
the option of having a seperate GUI based based interface for Hindawi, but
at the current status of the project too many different interface
implementations would Hinder the pace of technologial development. A GUI
based interface (like Dev-C++ for MinGW) is highly desirable, and I shall
devote some time to it soon. Possibly after the national rounds of the
Young IT Prof award.






Topic 5: Vernacular programming languages


Vivek wrote
> I would be interested to look at vernacularized
>programming languages .. it allows for a lot of development in many
>other fields.


Jeebesh wrote
> Your contribution to free software looks amazing.


Thank you for the appreciation. I hope you will be able to install and try
out Hindawi/BangaBhasha with the following instructions. I shall be posting
more detailed instructions on the Hindawi group page
http://groups.yahoo.com/group/hindawi


A) For Hindawi / BangaBhasha (suite of programming languages)
1) Download the necessary zip files from http://indicybers.com For Hindi
http://www.indicybers.com/HindawiR2CD.zip For Bangla
http://www.indicybers.com/BangaBhashaR1CD.zip
2) Unzip the files to a directory on your computer
3) The package has technical write-up's as PDF files and a voice narrated
presentation (PPT) file
4) Go to DOS prompt or DOS box under Windows and switch to the directory
where you unzipped the downloaded zip file
5) Run setup.exe
6) Follow on screen instructions (in English and vernaculars)
7) I would suggest you accept the default locations, unless you have some
constraints. The total installed foot-print is around 50 MB
8) Setup creates a batch file "hindawi.bat" with the necessary startup
script to set up the required environment variables.
9) After installation is completed, start Hindawi by running "hindawi.bat"
from the directory of installation. If Hindawi opens up in a window, which
it normally should not, press Alt-Enter to switch to full screen mode
10) This will take you to Aadesh - the Hindi command shell.
11) To type in Hindi /Bangla turn Scroll lok on (this is indicated on the
low right hand corner of the screen)
12) Hindawi uses INSCRIPT keyboard layout. JPEG file is available online
http://www.indicybers.com/devanagari.jpg
http://www.indicybers.com/bangla.jpg
13) To start the IDE type "Lekhak" in English or vernacular
14) Once lekhak has started up, you are taken to the welcome page of the
online help system
15) To close the help-box press Esc
16) To activate a menu press Alt+Red_lettered_key or use a function key
shortcut
17) Press F3 to open the load file dialog
18) Press Tab to go to the file-list section
19) Navigate down with the arrow keys till "samples/" is highlighted and
press enter.
20) Similarly select the directory for the language for which you wish to
see a program (later you may try writing your own vernacular programs)
robot - logo
prathmik - BASIC
guru - C
shraeni - C++
shabda - lex
vyaaka - yacc
kritrim - Java
yantrik - x86 assembly
21) Load the vernacular named files by highlighting their names as above
and pressing Enter
22) Follow the following shortcuts to compile and run
F5 - Compile + execute
F6 - execute a previously compiled program (only in hindi)
F7 - prepare a compiled program for deployment
F9 - compile only
NB: Please navigate down the source files by pressing PageDown as there is
a lot of copyright information in the beginning of each
23) To exit lekhak press Alt-X or choose nikaas(exit) from khaata(file) menu
24) To exit Aadesh type exit or nikaas(in vernaular)


B) For Bangla / Hindi DOS
i) Download and unzip the Hindi/Bangla DOS zip file
http://www.indicybers.com/HINDIDOS.ZIP or
http://www.indicybers.com/BANG_DOS.ZIP
ii) Run mkdisk.bat from the DOS prompt (DOS box under Windows)
iii) Follow on-screen prompts (in English) to create a boot disk for
Hindi / Bangla DOS
iv) Boot up your computer with this.
v) The DOS disk has the necessary files for hard-disk installation, but you
must be familiar with fdisk etc. Hindi/Bangla DOS setup is not yet ready.






Topic 6: Starting a conversation


Jeebesh wrote:
> We at Sarai are deeply invested in free software and localisation
> issues and would really like to start a conversation with you.


Thank you for the invitation. Thanks to Vivek for breaking open the topics.






Topic 7: Support needed


Jeebesh wrote:
> What kind of support are you looking for and what kind of developments
> you will like to pursue.


Sir, I am looking for support in terms of financial grants, besides
developers who would like to join into the effort. A lot of support is
required in terms of awareness generation.I would appreciate any form of
financial support such as financial grant, fellowship, sponsorship,
advertisements, co-branding offers, CD distribution offers, book
publication offers etc. (Personally I am also in the urgent need of a
suitable job, and would appreciate any referrals coming my way.)


Regards,
Abhishek


=========== A Bangla specific mail but it applies to all other vernaculars
==================
Hello friends,


Let us think of our brothers and sisters who have not been priviledged
enough to go to English medium schools. Even they have a right to benefit
from the ICT (Information and Communication Technology) revolution.
Software is certainly being written for them, but I would liken that to a
Mercedes car without a steering wheel. What I mean is that the Bangla
software available today is of wonderful quality, like a Mercedes in the
world of automobiles, but it only allows a user to perform a predetermined
function, hence no streeing wheel. Say the user wants to do something of
his own desire, how does he do it? Is DTP the only access we want to
provide our vernacular literate brothers and sisters with?


Hae Banga bhandaarae taba bibhida ratan,
Ta sabe (abodh aami) abahela kori!!


The answer lies in providing them with a programming language in the mother
tongue. Yes, and that too, one in which even the highest of technical
programs may be written, besides of course the simple ones. BangaBhasha is
just that. It offers Bangla LOGO and BanglaBASIC for the children and
beginners, and Bangla C, Bangla C++, Bangla assembly, Bangla lex, Bangla
yacc, Bangla Java etc. for the advanced and professionally inclined. With
this they can do any kind of programming, including even robotics and super
computing. This software does not require any special hardware, and can
also run on an old Pentium I or PII.


Friends, help us reach it to the people who need it. You need not pay us
anything for this, yes it is truly free. If you know a Bangla medium
school, which you certainly do, tell them about it. If you have an old
machine donate it to some school, college or social group where it can be
used to teach Bangla programming languages. You may even teach someone
Bangla programming yourself.


You may download Bangla DOS, Bangla C, Bangla C++, Bangla BASIC, Bangla
assembly, Bangla lex, Bangla yacc and many other systems tools in Bangla
from http://www.indicybers.com/ben_index.html or from the page
http://www.indicybers.com


If you are concerned about how good it really is, then may I humbly inform
you that it has won the Young IT Professional Award (E) 2005 from Computer
Society of India.


Please remember "Matribhasha rupi khoni, purna mani jaale"!


Regards,
Abhishek Choudhary


email : choudhary@indicybers.net
phone : +91-9831369549(Kolkata mobile)


================ Mail discussing important topics (2) ======================
Hi,


Here are some clarifications to issues pointed out by a member of a Hindawi
related discussion group regarding comment #8, with respect to the last
paragraph: (these are my replies to his queries and some which you could
possibly have.)


The final para of comment #8 was:
> Finally, as about Hindawi being *complete* - well, I say
> *complete* because I have *implemented* or *originally* (*not*
> forked) localised even *lex* and *yacc*. So we can now have
> *any* programming language written in Indian languages (Hindi,
> Bangla, Gujrati, Tamil, Kannada, etc.) This method can also be
> extended to *every* other human language, but my *personal*
> focus is on the India languages *for now*.


First, the reference to *lex* here is a reference to the standard family of
*lexical analyser generation tools* or more appropriately the standard
*pattern action language lex*, which automate the task of construction of
lexical analysers. This includes *GNU flex* and for all purposes the
compiler for the *lex language*, as mentioned here, is *flex* and not the
original *lex*, which (probably) is a proprietary program originally
written by Eric Schmidt and Mike Lesk, as per the wiki page
http://en.wikipedia.org/wiki/Lex
Please note that Aho, Sethi and Ullman (author of the "Dragon" book -
Compilers: Principles, Techniques and Tools) refer to the tool or program
as "the Lex compiler", and to its input specification as "the Lex language"
(pg 105, sec 3.5, 13th Indian reprint). I choose to adhere to their
definition and refer to the *lex language* by the term *lex*.


Similarly, the reference to *yacc* is a reference to the standard family of
*parser generation tools*, which include *GNU bison*. Original *AT&T yacc*,
developed by Stephen C. Johnson, is *not* proprietary any more as an open
source version of the *original* AT&T yacc is now available with the
standard distributions of Plan 9 and OpenSolaris, as per the wiki page
http://en.wikipedia.org/wiki/Yacc However, for all purposes here, by yacc I
refer to the *program* gnu bison which accepts *the input specification*
for yacc. Link to the original yacc source code distributed under Common
Development and Distribution License
http://cvs.opensolaris.org/source/xref/on/usr/src/cmd/sgs/yacc/


Secondly, since I am refering to *flex*, *yacc* and *bison* programs here,
I need to clarify another point, which possibly relates to *forking*. I
have *not* performed any source code modification on these programs. If
(and whenever) I modify the source for these programs I shall be very happy
to submit the diffs to the *original* authors for inclusion in the original
sources.


Then what have I done and why do I say that I have *implemented* or
*originally* (*not* forked) localised even *lex* and *yacc*?


I have *originally* localised *the language lex* and *the input
specification for yacc* to Indian languages (and similarly for at least one
language belonging to each of the definitive programming paradigms, but we
are only discussing lex and yacc here, as there may be some confusion
regarding their proprietary nature or concerning the issue of forking). Lex
and yacc languages are called "Shaili Shabda" (the language lex in Hindi,
Bangla, Gujrati etc.; Shabda means word, which is used here to
imply "token") and "Shaili Vyaaka" (the input specification for yacc in
Hindi, Bangla, Gujrati etc.; it is also called Shaili Vyaakaran in full;
Vyaakaran means grammar). These are *original* because, obviously, these
are *not* copied or forked from any other programming system, and *no*
other similar system exists, to the best of my knowledge.


What I mean by *implemented* is that I have implemented the languages
Shaili Shabda and Shaili Vyaaka, as decribed above along with other
vernacular languages. The current implementation of the tools for these
languages is as a front-end compiler. This is similar to the way in which
C++ was first implemented as a front-end compiler to C, called CFront.
These front-end compilers generate intermediate code which can be accepted
by various back-ends. For my distribution of Hindawi, I have choosen GCC as
back-end. Someone else may choose some other back-end, as GPL allows them
to do so. The benefits of having seperate front-ends and back-ends for
compilers certainly do not need to be over-emphasised! (With issues such as
optimisation, a new optimising compiler would not be worth the effort when
we already have GCC as a state-of-the-art optimising compiler as a back-
end.)


Third, there is a issue that flex and bison could themselves be localised
to support Indian languages. Well, yes, but how much, without a major
rewrite? Let me try to explain this, though I am not sure that I can
recollect every practical problem I faced. We can certainly write a lexer
with flex which accepts 8-bit Indic code, but for even a moderate sized
programming language, this 8-bit lexer would be tremendously huge if
optimised for speed, and very slow if optimised for size. I tried this with
flex initially, but when it kept crashing for a moderate set of tokens for
Indic programming languages, I decided to adopt a new approach. I could
have continued with modifying flex, but the effort required would be orders
of magnitude greater, requiring changes to internal structures and much
more. Along with this, consider the fact that Shabda and Vyaak are not
intended as replacements for lex and yacc. These are intended to support
Indic programming languages. Hence, even the action statements and other
stuff such as buffer funtions, error handlers, startup-code etc. are to be
written in Indic programming languages (in this case Shaili Guru, which is
the C programming language localised to Indian languages), hence a new
language was inevitable. Similar reasons can be stated for yacc as well.


Regards,
Abhishek Choudhary


================ Mail discussing important topics (3) ======================
Hello Sylvain,


Nice to hear from you. I am open to all questions related to the Hindawi
project. Please feel free to question, advise or criticise, as that gives
me an idea about how others percieve my work. I want this project to be
based on as solid a foundation as possible from the very beginning. I am
working towards bridging a digital divide (deeper than the Great Canyon),
carved by the rivers where unknown languages flow, and I want this bridge
to be as stable and as secure as possible, so that it does not break
midway, besides if it does not seem secure enough who would risk travelling
by it. What I mean in simple words is that as and when there are vernacular
developers (who write programs in programming languages in their mother
tongue), they should not find themselves disadvantaged for the lack of a
solid design of the system they work on or up against a glass cieling
because there is no upward technological support. I intend to provide them
a tool to carve out a living in this age of money and ICT synonymy.


Now about your queries.


You wrote:
> It would be enlightening for us if you could explain the
> difference between your approach and Unicode support (your
> mention of 8-bit charset is a bit surprising).


First, one needs to understand that Indic scripts include scripts used in
the Indian subcontinent and not just India alone. Prior to the advent of
UNICODE the only available encoding "standard" for these scripts was the
Indian Standard Code for Information Interchange, which replicated the 7-
bit ASCII and coded the Indic scripts in the extended (8-bit) half. I have
quoted the word standard owing to the fact that there were numerous
proprietary coding methods developed, but ISCII is the only identifiable
standard which came into common use. Reference to 8-bit *support* (ref. in
last msg - "We can certainly write a lexer with flex which accepts 8-bit
Indic code") implies being able to use a lexical analysis tool which can
analyse (tokenise) ISCII based source code, where even keywords and
variable names are in ISCII, and not just string literals, or comments (as
in a system developed by IIT-Madras where you can put only ISCII comments
in a C program but the keywords and variable names are in latin or roman
script).


These vernacular programming languages that the Hindawi project has
developed are a *new entity* altogether, if I might say so. They are not
merely libraries or functions providing I/O support for Indic scripts. I
call them Hindi C or Hindi assembly or Bangla C or Bangla assembly etc.
only for the sake of common understanding. They even have different names;
such as, the equivalent programming language of C in Indian languages is
Shaili Guru, vernacular C++ is Shaili Shraeni, vernacular yacc is Shaili
Vyaakaran and so on. However, these languages are syntax-compatible with
their traditional English counterparts and can, therefore, utilise the
existing libraries such as glibc etc. In a manner similar to the way C++ is
syntax compatible to C and hence can use most of the C libraries.


Nevertheless, the significance of 8-bit coding standard also needs to be
understood from a very practical viewpoint, keeping in mind that the
Hindawi project aims at *localising* programming languages and not just at
providing Indic support for I/O alone. If the concern was to provide a
method by which Indic scripts could be read (input) or written (output),
then the UNICODE support, possibly using wchar_t data type and related
functions could suffice. The goal or target is to create *compilers* for
*programming languages* in Indian vernaculars. The data type wchar_t and
related stuff use 32-bit / 16-bit implementation (platform dependent). And
in my last message itself I had pointed out the problems in compiler design
if the keywords and variables, or in general the tokens, of a programming
language are based on an 8-bit encoding. Obviously using 32 or 16 bit
encoding is not feasible. UNICODE support is ok as long as we wish to do
Indic I/O in the regular English derived compilers, where only string
literals are in UNICODE, but UNICODE based programming languages would, as
of now, be inefficient. This is because, beside the lexer overload and
possible breakdown, they would also suffer from the fact that the major
existing libraries have their symbols (language tokens, keywords,
variables) encoded in 7-bit ASCII and not UNICODE; not even 8-bit ASCII
since all alphabets and numerals are encoded between codes 0 and 127. Hope
you appreciate this fact.


Another point at hand is that the Hindawi project aims at having Indic
support at all levels of a computer system, right from the BIOS/POST to
programming languages. The internal representation of text needs to be 8-
bit, unless we have UNICODE hardware suport. Hindawi project has made it
possible to even have BIOS/POST in Indic scripts. Along with this the
assembly language used for coding the BIOS can also be a vernacular
equivalent (Hindawi Shaili Yantrik - the Hindi equivalent of traditional
assembly language; currently supporting the I32 (x86) processor assembly).








You wrote:
> we would like to better see the goal of your project and
> understand where support for other languages is added


Now if you are wondering whether such vernacular support is required then
consider this scenario, which elucidates the goal, though only to an extent:


Presume that you are educated in your mother tongue which is not English
and you know nothing about English other than yes, no and a few greetings.
Also presume that you live in a developing nation, and have meagre family
resources. There are families, with graduate members remaining unemployed,
which barely manage a *single* meal a day, if at all. Presume that you are
such a vernacular-literate (with schooling and/or graduation in a
vernacular medium), and yet unemployed. Someone suggests "why don't you
join IT, there is a lot of money in this field. Nice advice, but what next?
Well you go ahead and join a course in programming, where the programming
language is based on English, so first you *struggle* to pick up the chords
of English and then try to acquire programming skills. Presuming you get
through this phase and want to delve into deeper topics, again English is a
hindrance as all advanced technology requires English. On the other hand if
you join IT without learning English, you are restricted to only some
document creation software or applications. You can not decide your own
destiny and are dependent upon people who know English and will produce
programs for you with a vernacular interface. The vernacular applications
being produced by such people are very good indeed, and I often compare
them to a BMW or Mercedes; but without a steering wheel! They will only
take you along a fixed road. How do you then choose your own destination
and your own destiny? Are we looking at another colonial future? Remember
we are talking FreeSoftware - free in its very essence.


The solution lies in developing vernacular programming languages, where the
lack of knowledge of English will not proove to be a hurdle. The day you
start learing programming, you will start off with a topic in data
structures, or algorithms etc., instead of a dictionary! Hindawi is such a
solution, such an empowering tool.


I have already answered possible questions relating to why Hindawi is a
separate project and not a fork off some existing one in the previous
message. Please understand that these are separate programming languages
and not a separate I/O library for a existing language. This has been
explained earlier. Hindawi compilers are "currently" implemented as *front-
end* compilers to existing compilers for traditional English based
programming languages. This implies that a program written in Hindawi
is "compiled" into a program for another traditional programming language,
and then compiled using an existing compiler for that language. This is the
standard first step in "bootstrapping" the compiler for almost any
programming language. This also has the benefit that Hindawi programs
become internationally marketable. This is explained by the following topic
discussed in another email conversation.


-----------------------Pasted from another email conversation---------
Topic 1 : Bridging the gap beetween vernacular and English developers


Vivek wrote:
>I have followed your project for sometime now. Your project stands
>between two things. The already ready english based programming
>community and the possibly new and developing vernacular developers.
>This gap might again be challenge to bridge.


Point-> Hindawi(Hindi/Bangla C, C++ etc.) does 'not' create any further
divide or gap between vernacular developers and traditional (English)
developers. Hindi / Bangla C, C++, Java, assembly etc. get 'readily'
translated into their English equivalent. There is also provision for
translation of variable names, which certainly is not a trivial task and
formed the basis of my interaction with Swami Sarvottamanand, Dean of
Research and HOD Comp. Sci. at Belur Math college, who was one of the
distinguished judges during the YIPTA event. The reverse, that is,
translation of English programming languages and variable names to
vernacular programming languages and variable names also takes
place 'readily'; (here readily implies without any extra programming
effort). So there isn't any new gap that develops between traditional and
vernacular developers; only the old ones are bridged. I am working on a
system for complete machine translation of documentation as well (a limited
scale version should be ready in a month or two.)


Explanation-> This has been a very basic consideration during the course
of development of Hindawi. This also relates to another concern - that of
the utility of vernacular programming languages. Vernacular BASIC and LOGO
are fine as paedagogical aids (teaching tools), supposed to be used in the
classroom setting or by hobby programmers, (though I must point out that
Hindi/Bangla BASIC allows EXE files to be created and C code to be
embedded). However, the moment one talks of vernacular C, C++ , lex, yacc
or say vernacular Java, one has to consider the fact that learning these
languages shall involve a substantial effort. Though it is comparatively
simpler for a vernacular literate person than learning their English
manifestations. This shall prove meaningless if the skills acquired cannot
be used professionally, that is, if they are not marketable enough.
And "enough" certainly would include international markets. As metioned
earlier, the ready conversion of programs written in vernacular languages
to their English counterparts, and vice-versa, solves this problem. For
instance, let us consider a possible scenario set in the not-so-distant
future.


We are in 2012AD (six years from now), the first mile-stone year
in achieving the target of making India an ICT superpower. (By then I'm
sure ICT shall stand for Indian Cottage Technology, for indeed, that is
what I envision - to make Information and Communication Technology a
literal 'cottage' industry in India which shall cater solutions globally,
and provide at least some major relief to the unemployment situation. That
would be akin to the electronics industry in China today.) A person, say,
in the USA needs a piece of software for his new startup. He logs onto
the Indian "Software Exchange" (***a new social concept***) website,
posts his requirement in the standard format available there, and pays the
prescribed fees online (say an advance, with balance to be cleared as and
when the final settlements are done). This requirements' document is in a
restricted-grammar format and can be translated into a vernacular even with
2005 technology. The vernacular requirements' document is then provided to a
vernacular developer as per turn, who may accept or refuse the task; in
the latter case, it is handed over to the next developer in queue. The
vernacular developer then proceeds with the standard software engineering
steps of analysis, coding, testing, etc. He certainly codes in vernacular
programming languages and also does the analysis and testing in vernacular.
Where input from the end user is needed in the development process, the
developer provides the English source code, which as I have pointed out
is readily generated by the Hindawi compiler system (which by then shall
certainly have improved a lot). As about the English text messages
contained in the program; they are converted back into English by the
translation process at the Software Exchange. (*** Machine translation
is the piece of technology I am focussing my development efforts on now, but
for restricted-grammar this has been achieved ***.) The end product is
finally provided to the person in the USA, with complete source, variable
names, documentation etc. in English. Further work may be handled by either
a traditional (English) or vernacular developer. As an aside, I have only
skimmed through the description of this scenario. I already have worked
on aspects such as what changes to the currently practised software metrics
may be needed. Besides, a few new postions may need to be created, and
this person-centric scenario should be viewed as a team-centric one.


Summary-> There can be complete synergy between vernacular and traditional
(English) deveoper communities.
--------------------End of pasted block---------------------------------








You wrote:
> understand where support for other languages is added:
> is it in keyboard input methods, in programs input functions, in program
> identifiers, in programs output functions. If I understand well, your C
> programs won't be compilable with a classical gcc package, which seems a
> bit surprising as well


These are *new* programming languages, so how are they intended to be
compiled with a classical GCC package, and if they could be then what was
the need for project Hindawi! OK, now understand this - Hindawi is a
*suite* of *programming languages* (new ones!), i.e. a compiler collection
just as gcc is, for programming languages which use Indian languages as a
basis, instead of being based on English. Currently I'm using gcc (and yes,
*classical* gcc) as the back-end compiler, but any other back end could be
used, that is to say that Hindawi and all its sub-projects (shaili guru
etc.) could use any other compiler system instead of gcc as a backend.
Hindawi is based on the *completely new* and independently developed
technologies of Romenagri and APCISR, which I could have patented, but
decided to make them FreeSoftware. But again what do you mean by "won't be
compilable with a classical gcc package"? Can a Fortran program be compiled
with a C compiler, or a C program with a Fortran compiler? Hindawi is quite
like P2C, the pascal to C converter, for now. Remember I told you about how
C++ was intially made available as the CFront compiler, which you must be
aware of. Key-board input, console output, and everything else is
supported, but more than that the ability to compile source code with
keywords and variable names in Indic scripts while maintaining
compatibility with existing libraries is its uniqueness. Hope its clear now.


The capabilities you mention are provided by related projects Romenagri and
APCISR.




You wrote:
> explaining what's the goal/principle of "porting Linux kernel sources to
> Hindi/Bangla C, asm etc." would be interesting too


Again I have to draw your reference to the glass cieling factor. What about
a developer who has mastered Shaili Guru (Hindi C) and wishes to study the
internals of an OS like GNU/Linux? Porting Linux kernel sources helps with
this. Besides, just as Hindawi compilers can convert programs written in
vernacular programming languages to traditional (English) programming
languages, Hindawi also consists of a set of reverse compilers which do the
opposite, i.e. convert programs written in traditional (English)
programming languages to vernacular programming languages. These reverse
compilers can be used by a person to convert the Linux kernel sources to
the desired vernacular language. That is the principle and making Linux
kernel sources accesible to vernacular programming language based
developers is the goal.


Suitable filters and shell scripts shall be submitted to the Linux kernel
maintainers when developed, for inclusion with Linux kernel sources.
However, to prevent further confusion let me emphasize that we are not
aiming at modifying the Linux kernel sources in any manner under the
project Hindawi. I must admit here that the target is to make Hindawi
capable of handling a program as complex as the Linux kernel, the intention
is not to have a parallel kernel written in vernacular languages. Besides,
that becomes trivial as explained already, since the necessary translators
will exist. (I do not rule out such distributions by people interested in
paedagogical purposes, though. Linux kernel sources do not have comments in
Indian vernaculars, but that shall not be done under project Hindawi.)






You wrote:
> We also need to check that your project can be run on a free operating >
system.
> Your project appears to rely on FreeDOS, but I also see screenshots taken
> using MS Windows in your website


Certainly Hindawi runs on *vanilla* FreeDOS, and runs under Windows only
under DOS mode or in a text-mode DOS box. Hindawi is intended to use Linux
and FreeDOS as its primary basis. Why don't you try out the floppy
distribution of Swaadheen DOS, which is a bootable floppy (FreeDOS based)
and has Hindawi shell (aadesh) as a replacement for command.com, resulting
in a vernacular DOS. If you wish to try out the vernacular languages follow
the instructions below.


A) For Hindawi / BangaBhasha (suite of programming languages) under FreeDOS
1) Download the necessary zip files from http://indicybers.com
For Hindi
http://www.indicybers.com/HindawiR2CD.zip
For Bangla
http://www.indicybers.com/BangaBhashaR1CD.zip
2) Unzip the files to a directory on your computer
3) The package has technical write-up's as PDF files and a voice narrated
presentation (PPT) file
4) At the DOS prompt switch to the directory
where you unzipped the downloaded zip file
5) Run setup.exe
6) Follow on screen instructions (in English and vernaculars)
7) I would suggest you accept the default locations, unless you have some
constraints. The total installed foot-print is around 50 MB
8) Setup creates a batch file "hindawi.bat" with the necessary
startup
script to set up the required environment variables.
9) After installation is completed, start Hindawi by running "hindawi.bat"
from the directory of installation.
10) This will take you to Aadesh - the Hindi command shell.
11) To type in Hindi /Bangla turn Scroll lok on (this is indicated on the
low right hand corner of the screen)
12) Hindawi uses INSCRIPT keyboard layout. JPEG file is available online
http://www.indicybers.com/devanagari.jpg
http://www.indicybers.com/bangla.jpg
13) To start the IDE type "Lekhak" in English or vernacular
14) Once lekhak has started up, you are taken to the welcome page of the
online help system
15) To close the help-box press Esc
16) To activate a menu press Alt+Red_lettered_key or use a function key
shortcut
17) Press F3 to open the load file dialog
18) Press Tab to go to the file-list section
19) Navigate down with the arrow keys till "samples/" is highlighted
and
press enter.
20) Similarly select the directory for the language for which you wish
to
see a program (later you may try writing your own vernacular programs)
robot - logo
prathmik - BASIC
guru - C
shraeni - C++
shabda - lex
vyaaka - yacc
kritrim - Java
yantrik - x86 assembly
21) Load the vernacular named files by highlighting their names as above
and pressing Enter
22) Follow the following shortcuts to compile and run
F5 - Compile + execute
F6 - execute a previously compiled program (only in hindi)
F7 - prepare a compiled program for deployment
F9 - compile only
NB: Please navigate down the source files by pressing PageDown as there
is
a lot of copyright information in the beginning of each
23) To exit lekhak press Alt-X or choose nikaas(exit) from khaata(file)
menu
24) To exit Aadesh type exit or nikaas(in vernaular)






B) For Bangla / Hindi DOS
i) Download and unzip the Hindi/Bangla DOS zip file
http://www.indicybers.com/HINDIDOS.ZIP
or
http://www.indicybers.com/BANG_DOS.ZIP
ii) Run mkdisk.bat from the DOS prompt
iii) Follow on-screen prompts (in English) to create a boot disk for
Hindi / Bangla DOS
iv) Boot up your computer with this.
v) The DOS disk has the necessary files for hard-disk installation, but
you
must be familiar with fdisk etc. Automated Hindi/Bangla DOS setup is not
yet ready.






You wrote:
> Can you all parts of your project be ran using a stock version of FreeDOS,
> or may it require proprietary components?




Yes, Hindawi runs on *stock* versions of FreeDOS and does not require any
proprietary components. This is perhaps the *only* Indic software which
does so, leave alone any Indic programming language. (There are *no* other
*successful* implementations Indic programming languages besides Hindawi,
as yet!)




Finally, I would like to bring it to your notice that Hindawi has also been
selected for LinuxAsia 2006.




Regards,
Abhishek Choudhary


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.