Re: syslog parser

rkrayhawk@aol.com (RKRayhawk)
21 Feb 2000 23:55:39 -0500

          From comp.compilers

Related articles
syslog parser mballen@NOSPAM_erols.com (Michael B. Allen) (2000-02-19)
Re: syslog parser george@castro.dbnet.ece.ntua.gr (2000-02-21)
Re: syslog parser rkrayhawk@aol.com (2000-02-21)
| List of all articles for this month |

From: rkrayhawk@aol.com (RKRayhawk)
Newsgroups: comp.compilers
Date: 21 Feb 2000 23:55:39 -0500
Organization: AOL http://www.aol.com
References: 00-02-101
Keywords: parse, yacc

If you want


five,six.*;


to match the rule




selector: word '.' WORD { printf( "found a selector\n" ); }
                                ;


Then you would, _perhaps_, want the lexer to return the cipher star
('*') as a token WORD.


Instead you are trying to return it as a literal that the lower-case-named-rule
  word: would reduce.


You do not have a way for the selector: rule to see that. For example you do
not have (and probably do not want) this rule, noting the lower case carefully




selector: word '.' word { printf( "found a selector\n" ); }
                                ;


There are a few ideas that might help. First, probably, you want your
rule names and your token names to be rather more distinct, it easier
to spot problems that way. For example perhaps return WORD tokens but
have a descriptive name for the alternative somewhat recursive word
pattern rule such as word_recurs: or words_and_members: .




You may need a more robust set of rules to deal with the wild cards,
such as star ('*'). It may not be a good idea to allow it to live in
the general rule for word patterns and words and member patterns, if
you do not intend to allow it in all positions of the word pattern.
However, maybe you do and you can just treat an isolated wildcard like
any other 'word' syntactically, and let semantics deal with it (in the
action code).


Also, as an aside, the wild cards might not necessarily be entirely
isolated by periods: for example
    front*.txt
or
    front?back.ext


But I am going beyond your posted example. Yet my point is that at
the level your are parsing, it is not obvious that the wildcard is
syntactically distinct: maybe it is reasonable to let sematics tear
into the text string, and in lexing and syntax just glue the star (and
other wild cards) into the text string. Just an idea, you see, but if
you get into it you need to get way into it and the isolated rule for
star seems so tentative.


Also for your early experiments, maybe put a printf display in your alternate
top selector_set: rules.


In your post you have numerous good questions. For example,


<< Similarly if I simply enter a single word like 'hello\n', I get
a parse error. Is the newline causing a problem?
>>


You have a lexer rule that reads exactly as follows
<<
\n { return 0; }
>>
Don't do that. Generally do not return literally coded numeric values, you are
interfering with the table based technology of the lexer / parser interface.
Return a named token instead. More specically the numeric value zero is
reserved for the handshaking between the lexer and parser to designate the end
of file <EOF>.


An alternative, when the presence of newline in the input has no
meaning, or is otherwise to be treated as transparent, is to _not_
return. The lexer should truck on in the input stream. Some
applications will merely blip the line counter upon recognizing \n,
and not return. Your requirement is seems to be that the newline
should be returned as a unique token. But when you do not care about
a detected lexical pattern do _not_ return zero; either return nothing
at all or return a token that the parser is designed to ignore or cope
with optionally.




It is easy to find oneself in agreement with the moderator's
suggestion that you return whitespace as a token and transition into
(%x) exclusive states to scoop up the command line 'filename' off to
the right of the lines you present as examples. Seems like those will
potentially involve all manner of text that could charade as your
other patterns.




You ask
<<
Should I have the lexer doing more of the work?
>>


Rearanging your comments for flow here ... you also mention in regard to the
'filename' text off to the right of your samples
<<
  The filename is actually an "action" meaning it could be a command with
hyphens and so fourth. It should be interpreted as a generic text
string.
>>




If you have a need to actually parse into the command like matter in
that area, your requirements could conceivably lead beyond mere
oscilation between various (%x) exclusive states and the initial begin
state. In otherwords, if you plan to eventually interface a broad
category of config files to an XML API, you may need a design strategy
that permits stacking distinct parsers to deal with the complexity of
the 'command' like surface complexity in those strings off to the
right of the samples you present . So assess your needs before you go
too far. If you are really just trying to reformat and hand stuff to
an XML processor, you should not have much trouble. But if you need to
expand the wild cards in the commands or intelligently handle the
parms and arguements for those commands, that will get hairy. IF you
in anyway need to be the command line processor per se, you will want
to be out of the confines of a table driven goal oriented parser tool.


If the XML processor your are interfacing with is the command
processor (if its operating environment will be the command
processor), then you are really just the translator, and the classic
parser and lexer tools relate squarely to your needs.


I mention this because if that material is "action", in the sense that
you are to do things rather than the down stream XML processor, you
may need a two phase process or otherwise break out of the linguistic
tools.


But you do have a good start. Harnessing XML to the config processes
is definitely frontline work these days. Let us know how it goes.


Best Wishes,


Robert Rayhawk
RKRayhawk@aol.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.