|Compiler tools for languages involving full ISO 10646-1 charcter set? email@example.com (Peter Wilson) (1998-02-14)|
|Re: Compiler tools for languages involving full ISO 10646-1 charcter s firstname.lastname@example.org (Joachim Durchholz) (1998-02-18)|
|From:||Peter Wilson <email@example.com>|
|Date:||14 Feb 1998 14:36:03 -0500|
|Organization:||The Boeing Company|
|Keywords:||i18n, lex, yacc|
In tha past I have written interpreters for the EXPRESS language
family (ISO 10303-11:1994 and ISO/TR 10303-12:1997). I used flex and a
recursive descent parser for these, also some code for implementing
the regular expression elements of the EXPRESS language. The character
set for EXPRESS was limited to ASCII.
A new edition of EXPRESS is in preparation and I might have to
write a new interpreter. This edition allows the full ISO 10646-1
character set to be used in string literals, and also as symbols
(e.g. rather than a square root function a square root operator whose
glyph is a square root sign ISO 10646-1 code 0000221A). It may also
happen that identifiers could involve the full 10646-1 character set
rather than just ASCII letters and digits.
Are there any compiler building tools that could be used? How
about public domain code for regular expression matching against
the 10646 character set? Any pointers, hints, literature references
would be much appreciated.
[This topic comes up from time to time. Most of the regex code would
be hard to apply directly because it uses the character code as an
index into a table. Several people have pointed out that the input
characters invariably fall into a moderate number of equivalence
classes, so an ad-hoc lookup to turn each character into a class
number followed by normal regexp on the class numbers should be
Return to the
Search the comp.compilers archives again.