Re: Need help lexing string literal

kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763)
4 Aug 1996 00:32:30 -0400

          From comp.compilers

Related articles
Need help lexing string literal doane@them.com (1996-07-31)
Re: Need help lexing string literal mulder@dce.philips.nl (Ruud Mulder RAF448 85606) (1996-08-01)
Re: Need help lexing string literal kanze@lts.sel.alcatel.de (1996-08-04)
Re: Need help lexing string literal henry@zoo.toronto.edu (Henry Spencer) (1996-08-04)
| List of all articles for this month |

From: kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763)
Newsgroups: comp.compilers
Date: 4 Aug 1996 00:32:30 -0400
Organization: GABI Software, Sarl.
References: 96-07-213 96-08-009
Keywords: lex

Ruud Mulder RAF448 85606 <mulder@dce.philips.nl> writes:


|> The original LEX-manual by M.E.Lesk and E. Schmidt has the following advice when
|> using plain LEX:


|> > Example: Consider a language which defines a string as a set of characters between
|> > quotation (") marks, and provides that to include a " in a string it muat be
|> > preceede by a \. The regular expression which matches that is somewhat confusing,
|> > so that it might be preferable to write
|> >
|> > \"[^"]* {
|> > if (yytext[yyleng-1] == '\\')
|> > yymore();
|> > else
|> > .. normal user processing
|> > fi


|> I hope this helps.


Have you tried it on the (literal) string "\\"?


The MKS lex offered a few special subroutines to handle cases like these
(typical, but not easily expressed as a regular expression). Failing
that, it shouldn't be too difficult to write your own. Most of the lex
based scanners I've seen have simply recognized the initial quote
character, and scanned the rest of the string with hand coded C.


The following works (even if it looks like line noise):


\"(\\.|[^\"\\])*\"


but it leaves the escaping '\' in the buffer. More generally, you can
do the job with start states:


%S INCOMMENT LNCOMMENT SQUOTE DQUOTE
%%
<INITIAL>\" { BEGIN DQUOTE ; enterDQuote() ; }
<DQUOTE>\" { BEGIN INITIAL ; processDQuote() ; }
<SQUOTE,DQUOTE>\\. { addToQuote( yytext[ 1 ] ) ; }
<SQUOTE,DQUOTE>. { addToQuote( yytext[ 0 ] ) ; }


You can add special cases ("\n", etc.) as you like. As you might guess
from the extract, this technique also handles comments. Just don't
expect the results to be anywhere near as fast as if you coded it by
hand. (On the other hand, I've generally found them fast enough for
whatever I was doing at the time.)
--
James Kanze Tel.: (+33) 88 14 49 00 email: kanze@gabi-soft.fr
GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.