Re: Syntax Highlighting and Lexical Analysis

Marat Boshernitsan <maratb@CS.Berkeley.EDU>
20 Sep 1999 11:58:16 -0400

          From comp.compilers

Related articles
Syntax Highlighting and Lexical Analysis Dominic@tootedom.freeserve.co.uk (Dominic Tootell) (1999-09-11)
Re: Syntax Highlighting and Lexical Analysis jacob.navia@wanadoo.fr (jacob.navia) (1999-09-16)
Re: Syntax Highlighting and Lexical Analysis webid@asi.fr (Armel) (1999-09-16)
Re: Syntax Highlighting and Lexical Analysis Marko.Makela@HUT.FI (Marko =?ISO-8859-1?Q?M=E4kel=E4?=) (1999-09-20)
Re: Syntax Highlighting and Lexical Analysis maratb@CS.Berkeley.EDU (Marat Boshernitsan) (1999-09-20)
Re: Syntax Highlighting and Lexical Analysis qjackson@wave.home.com (Quinn Tyler Jackson) (1999-10-04)
| List of all articles for this month |
From: Marat Boshernitsan <maratb@CS.Berkeley.EDU>
Newsgroups: comp.compilers
Date: 20 Sep 1999 11:58:16 -0400
Organization: University of California at Berkeley
References: 99-09-041
Keywords: tools, lex

"Dominic Tootell" <Dominic@tootedom.freeserve.co.uk> writes:


> I'm trying to building my own editor.
>
> I have never before done any lexical analysis type work, and I was wondering
> if anyone could point me in the correct direction. I know that you read the
> file in and produce a parse tree build on tokens (using a variation of the
> red black tree). The problem is how to I do about parsing the file,
> especially when commands can span lines, eg curly brackets and the like.
>
> If anyone can help me, or provide me with any information I will be most
> grateful. The kind of syntax highlighting I am looking for is the type that
> is done in emacs for C code or Java. I know that emacs uses an internal
> Lisp engine to read the code depending on a .el configuration file, but the
> thought of having to program a Lisp engine is scarry, and I have never
> bofore had any interaction with lisp.


The "really right" (and really general) way to do this is to use an
incremental lexer and relex at each keystroke. This lets you maintain
precise lexical information at all times and handle all possible cases
without having to craft complicated regexes[1].


One way to build an incremental lexer by simply driving a (possibly flex
generated) batch lexer is described in one of the chapters in Tim
Wagner's thesis:


    Tim A. Wagner. Practical Algorithms for Incremental Software Development
    Environments Ph.D. Dissertation, Report No. UCB//CSD-97-946
    http://sunsite.berkeley.edu:80/Dienst/UI/2.0/Describe/ncstrl.ucb/CSD-97-946


(the thesis also talks about how to build an incremental LALR(1) and GLR
parser)


This will handle any anal language you can ever describe with something
like flex (and it makes it easy to support many languages in one
editor); however if your language's lexical structure is simple, then
this is probably an overkill and regexes is the way to go.


HTH,


Marat.


Footnotes:
[1] You can imagine that correctly highlighting something
like this would be rather difficult with regexes:


int
foo(
        int x,
        /* a funky comment
        int y */
        );


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.