Re: yytext in POSIX lex

schrod@iti.informatik.th-darmstadt.de (Joachim Schrod)
Thu, 4 Feb 1993 11:32:23 GMT

          From comp.compilers

Related articles
yytext in POSIX lex amin@unirsvl.RSVL.UNISYS.COM (1993-02-03)
Re: yytext in POSIX lex schrod@iti.informatik.th-darmstadt.de (1993-02-04)
Re: yytext in POSIX lex vern@daffy.ee.lbl.gov (1993-02-05)
| List of all articles for this month |
Newsgroups: comp.compilers
From: schrod@iti.informatik.th-darmstadt.de (Joachim Schrod)
Keywords: lex, comment
Organization: TH Darmstadt, FG Systemprogrammierung
References: 93-02-035
Date: Thu, 4 Feb 1993 11:32:23 GMT

The moderator writes:
> [Of the two major implementations of lex, AT&T lex makes yytext an array
> and flex makes it a pointer. Some poorly written lexers depend on it
> being one or the other, and %array and %pointer make it easier to port
> such code. You have to try fairly hard to tell the difference, e.g. use
> sizeof(yytext). -John]


Hmm, I see this a bit different.


If the element type of yytext is "unsigned char" and your default char
attribute is "signed char", an ANSI C compiler will complain if you call
strcpy (even worse, a C++ compiler will signal an error). So, a common
technique is to define a variable s_yytext (here `s' stands for `string',
not for `signed') which happens to have the same value as yytext but with
an element type of "char" (ie, it's "char *").
        If yytext is an array, this variable may be initialized once (may even
be "const"), if yytext is a pointer, it must be initialized anew before
each action. (In the latter case one needs something like the
YY_USER_ACTION of flex.)


I want to emphasize that (for me) this has nothing to do with %array and
%pointer in the first place -- it's a note concerning the opinion from
above that the difference between array and pointer implementations is
only noticable in ``bad'' lexers. It's visible in lexers which have C++
actions.


Btw, I've checked POSIX.2 about the `official element type' of yytext. The
phrase ``yytext is either an external character array or a pointer to a
character string'' might be open to interpretation. :-(
        When I'm about it, I want to mention another problem with lex for
POSIX.1 based software (since this belongs more to the subject of this
thread :) Many systems (eg, AIX and HP-UX) support more than one
`standard', and select the appropriate system by some cpp macros when the
first system include file is read in. Now POSIX demands that I shall add a
"#define _POSIX_SOURCE" to my source (at the very front, of course), and
promises me that I will receive a POSIX.1 environment then. Nice -- but
this does not happen in lex sources. Here I have to give -D_POSIX_SOURCE
(or -D_XOPEN_SOURCE or whatever I want) on the command line of the C
compiler. (Which is normally never called by me, but by some
make/imake/or-similar tool...) If I forget this I will have a plain ANSI C
environemnt -- without any further function or type declarations.
<Grmpfh>.


--
Joachim Schrod Email: schrod@iti.informatik.th-darmstadt.de
Computer Science Department
Technical University of Darmstadt, Germany
[I'd #define s_yytext as ((char *) yytext) if that were a problem. -John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.