Grammar problems with two equal tokens

bibbs@sapo.pt (joao vaz)
12 Feb 2003 13:46:41 -0500

          From comp.compilers

Related articles
Grammar problems with two equal tokens bibbs@sapo.pt (2003-02-12)
| List of all articles for this month |
From: bibbs@sapo.pt (joao vaz)
Newsgroups: comp.compilers
Date: 12 Feb 2003 13:46:41 -0500
Organization: http://groups.google.com/
Keywords: parse, question
Posted-Date: 12 Feb 2003 13:46:41 EST

The problem is the following :


I have a structure like this:


SegmentNumber|ItemNumberxxx|ItemNumberyyy|ItemNumberNNN


The segment number is a 3 number field , the item number is also a 3
number
field and
the xxx and yyy are the description for the item number .
So I have this values for instance in a text file:


000|001xpto|002xpto2
001|001foo|005garbage
002|001 12|006bar


The spaces are valid characters and cannot be stripped (every space
counts). A segment have n item numbers, and each line is a different
record.


The problem lies that the 1st token that is parsed is the item number
that is a 3 number instead of segment number that is also 3 number it
gives the parser error that token expected was segment number but it
extracted the item number , how I can resolve this ? How can I
differentiate between 2 equal semantic tokens thar are
{DIGIT}{DIGIT}{DIGIT} ? Sorry for the stupid question, but I'm a
newbie to parser technolgy and any kind of pointer will be highly
appreciated :-))))


Here is a stripped version of the grammar (used with Gold parser
builder):


<Program> ::= <List Records>
IdentifierSegment = ^{DIGIT}{DIGIT}{DIGIT}
<List Records> ::= <List Record> {Separator} <List Records>
                  | <List Record>
<List Record> ::= <Record> NewLine
<Record> ::= IdentifierSegment {Separator} <List Items>
Identifier = [^|]{Alphanumeric}+
IdentifierItem = {DIGIT}{DIGIT}{DIGIT}
<List Items> ::= <List Item> {Separator} <List Items>
                  | <List Item>
<List Item> ::= IdentifierItem Identifier
{WS} = {Whitespace} - {CR} - {LF}
Whitespace = {WS}
NewLine = {CR}{LF}|{CR}
{Separator} = ['|']


Joao Vaz
[I'd use a lexical hack to return a different token type at the
beginning of a line. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.