Re: [Q] Basic Terminology Questions

rkrayhawk@aol.com (RKRayhawk)
9 Jan 2001 23:23:24 -0500

          From comp.compilers

Related articles
[Q] Basic Terminology Questions mtp1032@home.com (Michael Peterson) (2001-01-05)
Re: [Q] Basic Terminology Questions tenkey@my-deja.com (2001-01-06)
Re: [Q] Basic Terminology Questions rkrayhawk@aol.com (2001-01-09)
| List of all articles for this month |

From: rkrayhawk@aol.com (RKRayhawk)
Newsgroups: comp.compilers
Date: 9 Jan 2001 23:23:24 -0500
Organization: AOL http://www.aol.com
References: 01-01-028
Keywords: parse
Posted-Date: 09 Jan 2001 23:23:24 EST

The sense of the word 'terminal' in compiler discussions, is that the
thing is at the 'end'.


Permit me to give a quick explanation, and then a slightly longer one.


There is a hierarchy of peices to a source file, like the branches of
a tree. Out at the end of the branches are the terminals (the
end-points) of the tree-hierarchy.


The theory of compiler tools is that a source file in which we might
be interested


  1) has a collection of pieces (since we start our conversation here,
it seems funny to call these end-points, or terminals ... but
eventually you learn that you can climb up trees and down trees
fluidly and calling something start or finish is just a name, as long
as we agree on names we could give it any name).


  2) the basic theory goes on to assume that the pieces in the source
file are arranged in some kind of a structure. This structure could
just be a sequence, or it could actually be hierarchical like a
corporate structure.


  3) the basic theory then makes the single most important assumption
of all. It is assumed that if we can detect the structure (or the
mere pattern) of the collection of peices in the source, from that we
can discern the meaning of the source file (we can determine the
semantics). As you progress in your studies and master the vocabulary
of this technology, you will have occassion to look back at this
assumption with some curiosity.


Some tools break this set of assumptions into two separate programs
. The first part tries to find the pieces. This discovery phase is
usually called Syntax analysis. As one other poster pointed out the
tools use a set of rules to define the pieces that should be found one
by one in the source.(These are the syntax rules).


These discovered pieces are called terminals, this will seem a little
more clear in a moment.


A second phase gathers the discovered pieces and tries to see if they
match specified patterns of pieces. Again as another poster pointed
out, that phase gets rules from you that are a separate type of
rule.(These are the grammar rules) In some tools this is handled by a
program that is distinct from the front-end, and this phase is called
parsing, or Semantic analysis.


Some tools make your job easy by merging the tool interfaces so that
you think you are only talking to one analysis program. It is easy to
get started with these tools, but as you are finding it can be a
little confusing.


The parser that is driven by the sequence of tokens from the
front-end, arranges these into a hierarchy. This is really the first
place that a tree shows up in the discussion. The parser records
pieces in a tree structure, and we say that the original source had
pieces 'in a certain hierarchical arrangement' that we
perceive. Acually all the source file has is characters


We impose the hierarchy with our minds and tools. The end-points of
this project tree structure (psychological projection of programmatic
recordation) are simply called 'terminals'. Those terminals can only
be found by the front-end syntax analysis that decides certain
anticipated character sequences consitute a token to be passed to the
parser.


Any rule given to the parser that asserts that one or more tokens
constitue a pattern, defines a non-terminal.


non-terminal-example: RAW-TOKEN


Of course, non-terminals can have two or more tokens.


non-term-1: TOKEN1 TOKEN2


These are definitions (inside the parser control file) of things which
the syntax engine is not smart enough to find: syntax (basically) can
only find one token at a time.


The simplistic non-terminals illustrated so far are at the first level
in from raw tokens (terminals) in a hierarchical structure. Any rule
(controlling the parser) that defines things at that level, or further
in towards the trunk of the tree, are non-terminals.


In most practical tools, the whole set of rules must be gathered
(hierarchically) into one final non-terminal. This grammar rule can
be called, END, or BEGIN, or if you like FRED-SHMED. It is just a
name.


In a certain sense, a tree has two end-point types. The trunk and the
numerous ends of the branches.


Downstream processes 'walk' the tree recorded by the parser,
efficient systems tend to walk in one direction, but you eventually
encounter complex requirements that force algorithms to either look
back in to other levels, or to have type data passed to routines as
they go further out on the limbs.


Once you see that it is a tree, and you can call the trunck begin or
end, you see that the multitude of end-points on the branches are as
well named 'terminal' as any other name.


It is important to see that the terminals are the entire list of
vocabulary shared between the syntax analysis and the semantic
analysis. The grammar can not resolve any detail more precise than the
list of tokens defined in the interface to syntax at the front-end.


Best Wishes,


Robert Rayhawk
rayhawk@alum.calberkeley.org


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.