Re: Has anyone hand-written a scanner/parser module?

David Z Maze <dmaze@mit.edu>
Mon, 17 Nov 2008 10:26:23 -0500

          From comp.compilers

Related articles
[6 earlier articles]
Re: Has anyone hand-written a scanner/parser module? efutch@gmail.com (Egdares Futch) (2008-11-16)
Re: Has anyone hand-written a scanner/parser module? jgd@cix.compulink.co.uk (2008-11-16)
Re: Has anyone hand-written a scanner/parser module? idbaxter@semdesigns.com (Ira Baxter) (2008-11-16)
Re: Has anyone hand-written a scanner/parser module? rajamukherji@gmail.com (Raja Mukherji) (2008-11-16)
Re: Has anyone hand-written a scanner/parser module? bill@qswtools.com (Bill Cox) (2008-11-16)
Re: Has anyone hand-written a scanner/parser module? marcov@stack.nl (Marco van de Voort) (2008-11-17)
Re: Has anyone hand-written a scanner/parser module? dmaze@mit.edu (David Z Maze) (2008-11-17)
Re: Has anyone hand-written a scanner/parser module? gene.ressler@gmail.com (Gene) (2008-11-17)
Re: Has anyone hand-written a scanner/parser module? arnold@skeeve.com (2008-11-18)
Re: Has anyone hand-written a scanner/parser module? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-11-18)
Re: Has anyone hand-written a scanner/parser module? charlesb.cca@mpowercom.net (Charles E. Bortle, Jr.) (2008-11-18)
Re: Has anyone hand-written a scanner/parser module? r3jjs@yahoo.com (Jeremy J Starcher) (2008-11-19)
Re: Has anyone hand-written a scanner/parser module? armelasselin@hotmail.com (Armel) (2008-11-19)
[7 later articles]
| List of all articles for this month |
From: David Z Maze <dmaze@mit.edu>
Newsgroups: comp.compilers
Date: Mon, 17 Nov 2008 10:26:23 -0500
Organization: Massachusetts Institute of Technology
References: 08-11-061
Keywords: parse, XML
Posted-Date: 17 Nov 2008 18:30:38 EST

"tuxisthebirdforme@gmail.com" <tuxisthebirdforme@gmail.com> writes:


> I know most people anymore use lex/yacc or some derivative of these
> tools to create scanner/parser modules for their compiler projects. I
> was wondering if anyone has developed a scanner or parser that they
> personally hand-wrote? If so, I would like to know what language you
> used and what type of grammar you parsed.


In my copious free time, I hand-wrote (sort of) a parser for XML 1.0
(non-validating, ignores character-set issues, rejects DTDs, does do
namespaces) a month or two ago. I wrote this in Haskell using the
Parsec support library, and generated a straightforward tree
representation of the XML. I say "sort of hand-wrote" in that Parsec
isn't really a parser generator in the same sense that yacc is; also, a
lot of its functionality could be better expressed in modern Haskell
extensions like arrows and the Control.Applicative module that post-date
Parsec.


At any rate, this is an LL(0) implementation, with appropriate context
checking for duplicate attributes and tag matching. Since Haskell
supports functions as first-class objects, I can turn a grammar fragment
like


document ::= xml-declaration?
                          (whitespace | processing-instruction | comment)*
                          element
                          (whitespace | processing-instruction | comment)*


into (syntax approximate, ignoring many issues)


document :: Parser XMLDocument
document = do optional xmldeclaration
                            pre <- many (whitespace <|> pi <|> comment)
                            elt <- element
                            post <- many (whitespace <|> pi <|> comment)
                            return $ XMLDocument (pre ++ [elt] ++ post)


xmldeclaration :: Parser ()
xmldeclaration = do string "<?xml"
                                        -- stuff
                                        string "?>"
-- etc.


where all of the above is *code*, not a description that needs to be
preprocessed. The only tricky thing is refactoring the grammar into
LL(0) form since otherwise Parsec will pick up the '<' character for the
obvious construction of processing instructions (<?name ... ?>) and then
complain when it doesn't see the '?' for comments (<!--... -->) or
elements (<name>).


(Also there is some amount of wrapping your head around Haskell, of
course; a lot of deep magic is hidden in that "do".)


HTH,


    --dzm



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.