Grammar problems with two equal tokens

bibbs@sapo.pt (joao vaz)
12 Feb 2003 13:46:41 -0500

From comp.compilers

Related articles
*Grammar problems with two equal tokens bibbs@sapo.pt* (2003-02-12)**

| List of all articles for this month |

From:	bibbs@sapo.pt (joao vaz)
Newsgroups:	comp.compilers
Date:	12 Feb 2003 13:46:41 -0500
Organization:	http://groups.google.com/
Keywords:	parse, question
Posted-Date:	12 Feb 2003 13:46:41 EST

The problem is the following :

I have a structure like this:

SegmentNumber|ItemNumberxxx|ItemNumberyyy|ItemNumberNNN

The segment number is a 3 number field , the item number is also a 3
number
field and
the xxx and yyy are the description for the item number .
So I have this values for instance in a text file:

000|001xpto|002xpto2
001|001foo|005garbage
002|001 12|006bar

The spaces are valid characters and cannot be stripped (every space
counts). A segment have n item numbers, and each line is a different
record.

The problem lies that the 1st token that is parsed is the item number
that is a 3 number instead of segment number that is also 3 number it
gives the parser error that token expected was segment number but it
extracted the item number , how I can resolve this ? How can I
differentiate between 2 equal semantic tokens thar are
{DIGIT}{DIGIT}{DIGIT} ? Sorry for the stupid question, but I'm a
newbie to parser technolgy and any kind of pointer will be highly
appreciated :-))))

Here is a stripped version of the grammar (used with Gold parser
builder):

<Program> ::= <List Records>
IdentifierSegment = ^{DIGIT}{DIGIT}{DIGIT}
<List Records> ::= <List Record> {Separator} <List Records>
| <List Record>
<List Record> ::= <Record> NewLine
<Record> ::= IdentifierSegment {Separator} <List Items>
Identifier = [^|]{Alphanumeric}+
IdentifierItem = {DIGIT}{DIGIT}{DIGIT}
<List Items> ::= <List Item> {Separator} <List Items>
| <List Item>
<List Item> ::= IdentifierItem Identifier
{WS} = {Whitespace} - {CR} - {LF}
Whitespace = {WS}
NewLine = {CR}{LF}|{CR}
{Separator} = ['|']

Joao Vaz
[I'd use a lexical hack to return a different token type at the
beginning of a line. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Grammar problems with two equal tokens

bibbs@sapo.pt (joao vaz)12 Feb 2003 13:46:41 -0500

bibbs@sapo.pt (joao vaz)
12 Feb 2003 13:46:41 -0500