Related articles |
---|
PDF grammar and PostScript grammar p11111@kr.onet.pl (psobkiew) (2002-01-30) |
Re: PDF grammar and PostScript grammar adamo@dblab.ece.ntua.gr (2002-02-06) |
Re: PDF grammar and PostScript grammar dmaze@mit.edu (David Z Maze) (2002-02-06) |
Re: PDF grammar and PostScript grammar derekn@foolabs.com (2002-02-16) |
Re: PDF grammar and PostScript grammar yves.deweerdt@softhome.net (2002-02-28) |
From: | derekn@foolabs.com (Derek B. Noonburg) |
Newsgroups: | comp.compilers |
Date: | 16 Feb 2002 01:13:36 -0500 |
Organization: | Prodigy Internet http://www.prodigy.com |
References: | 02-01-171 02-02-009 |
Keywords: | parse |
Posted-Date: | 16 Feb 2002 01:13:36 EST |
> [PDF is basically tarted up Postscript, and Postscript has a trivial
> token stack syntax like that of Forth. -John]
A PDF page content stream is simplified PostScript -- no control flow,
no real stack. It's a sequence of operations, where each operation is
zero or more operands followed by an operator, e.g., "10 20 m 100 200
l" means move to the point (10, 20), and then draw a line to (100,
200). Each operator completely consumes its operands and leaves
nothing on the stack (unlike Forth and PostScript).
PDF files are more complex. A PDF file consists of a sequence of
numbered objects. Examples of objects are fonts, images, hyperlinks,
page content streams, and lots more. There's a cross-reference
("xref") table at the end of the file that maps object number to
position in the file (byte offset from the beginning of the file).
It's actually even messier - a file can be "updated": you tack some
more objects on the end, some of which can logically replace existing
objects, and then append a new xref table with offsets for the new
objects and a pointer to the previous xref table.
PDF really isn't something you want to attack with lex and yacc.
- Derek
Return to the
comp.compilers page.
Search the
comp.compilers archives again.