Related articles |
---|
Simple question on lex/yacc specifications eric.fowler@gmail.com (Eric Fowler) (2009-03-13) |
Re: Simple question on lex/yacc specifications eric.fowler@gmail.com (Eric Fowler) (2009-03-14) |
Re: Simple question on lex/yacc specifications kym@svalbard.freeshell.org (russell kym horsell) (2009-03-15) |
Re: Simple question on lex/yacc specifications max@gustavus.edu (Max Hailperin) (2009-03-15) |
Re: Simple question on lex/yacc specifications eric.fowler@gmail.com (Eric Fowler) (2009-03-15) |
From: | Eric Fowler <eric.fowler@gmail.com> |
Newsgroups: | comp.compilers |
Date: | Sun, 15 Mar 2009 16:30:26 -0700 |
Organization: | Compilers Central |
References: | 09-03-058 09-03-063 |
Keywords: | lex |
Posted-Date: | 15 Mar 2009 21:56:24 EDT |
Thanks.
I am aware using lex for this project is overkill but (a) I have a lot
of different sentence types to scan, and I want a consistent and
bulletproof way to do it (the specification I am working from defines
about 100-200 "sentences" that all look a little like this), and (b)
some of the fields themselves can be complicated and I want to tackle
them with a parser anyways, and (c) it's an excuse to get back into
learning lex and yacc with a simple problem set.
It seems most of my issues revolve around not knowing where I should
be doing error checking on the input. For instance, if I am expecting
a number less than 100 in a particular place, i.e., "...,50,..." at
what point should I be weeding out empty tokens, i.e., "...,,..." (in
other places I will have numeric fields that can be blank)?
Intuitively I think you want to get them early in the process but that
means the tokenizer just tells you if you have an empty field or a
NUMBER token. So I am defining tokens for NUMBER and for COMMA
(overkill again) and leaving it to the parser to figure it out ...
which is, as far as I can see now, the Right Way[tm] to do it.
Yes, I could be doing it all with strtok(). But I like doing things
the hard way.
Eric
PS. strtok() actually is not your best friend here because when you
get delimiters side-by-side with nothing intervening, strtok() removes
them all. For example, strtok(",,,FOO,,,", ",") will return the single
token "FOO" on it's first call and nothing thereafter. So you have to
tokenize another way. Not that it's real hard.
Thanks again
Eric
Return to the
comp.compilers page.
Search the
comp.compilers archives again.