Re: Simple question on lex/yacc specifications

Eric Fowler <eric.fowler@gmail.com>
Sun, 15 Mar 2009 16:30:26 -0700

          From comp.compilers

Related articles
Simple question on lex/yacc specifications eric.fowler@gmail.com (Eric Fowler) (2009-03-13)
Re: Simple question on lex/yacc specifications eric.fowler@gmail.com (Eric Fowler) (2009-03-14)
Re: Simple question on lex/yacc specifications kym@svalbard.freeshell.org (russell kym horsell) (2009-03-15)
Re: Simple question on lex/yacc specifications max@gustavus.edu (Max Hailperin) (2009-03-15)
Re: Simple question on lex/yacc specifications eric.fowler@gmail.com (Eric Fowler) (2009-03-15)
| List of all articles for this month |
From: Eric Fowler <eric.fowler@gmail.com>
Newsgroups: comp.compilers
Date: Sun, 15 Mar 2009 16:30:26 -0700
Organization: Compilers Central
References: 09-03-058 09-03-063
Keywords: lex
Posted-Date: 15 Mar 2009 21:56:24 EDT

Thanks.


I am aware using lex for this project is overkill but (a) I have a lot
of different sentence types to scan, and I want a consistent and
bulletproof way to do it (the specification I am working from defines
about 100-200 "sentences" that all look a little like this), and (b)
some of the fields themselves can be complicated and I want to tackle
them with a parser anyways, and (c) it's an excuse to get back into
learning lex and yacc with a simple problem set.


It seems most of my issues revolve around not knowing where I should
be doing error checking on the input. For instance, if I am expecting
a number less than 100 in a particular place, i.e., "...,50,..." at
what point should I be weeding out empty tokens, i.e., "...,,..." (in
other places I will have numeric fields that can be blank)?


Intuitively I think you want to get them early in the process but that
means the tokenizer just tells you if you have an empty field or a
NUMBER token. So I am defining tokens for NUMBER and for COMMA
(overkill again) and leaving it to the parser to figure it out ...
which is, as far as I can see now, the Right Way[tm] to do it.


Yes, I could be doing it all with strtok(). But I like doing things
the hard way.


Eric


PS. strtok() actually is not your best friend here because when you
get delimiters side-by-side with nothing intervening, strtok() removes
them all. For example, strtok(",,,FOO,,,", ",") will return the single
token "FOO" on it's first call and nothing thereafter. So you have to
tokenize another way. Not that it's real hard.


Thanks again


Eric



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.