Re: Beginner's Question...

William D Clinger <will@ccs.neu.edu>
17 Jan 1997 23:29:44 -0500

          From comp.compilers

Related articles
Beginner's Question... mihai@A-aod.resnet.ucsb.edu (Mihai Christodorescu) (1997-01-16)
Re: Beginner's Question... salomon@silver.cs.umanitoba.ca (1997-01-16)
Re: Beginner's Question... jlilley@empathy.com (John Lilley) (1997-01-16)
Re: Beginner's Question... edi-c@algonet.se (Kurt Svensson) (1997-01-17)
Re: Beginner's Question... will@ccs.neu.edu (William D Clinger) (1997-01-17)
Re: Beginner's Question... jlilley@empathy.com (John Lilley) (1997-01-19)
| List of all articles for this month |
From: William D Clinger <will@ccs.neu.edu>
Newsgroups: comp.compilers
Date: 17 Jan 1997 23:29:44 -0500
Organization: Northeastern University
References: 97-01-122
Keywords: practice, performance

Mihai Christodorescu <mihai@A-aod.resnet.ucsb.edu> asked:


        Is [the time-efficiency of parsing and maybe the rest of the
        compiler] a real problem in compiler theory and compiler making?


Good question.


Programs usually contain 5 to 10 times as many characters as tokens,
so scanning usually takes more time than parsing. Small differences
in the efficiency of parsing are therefore swamped by time spent in
the scanner.


This assumes you are using an O(n) parser such as recursive descent or
LR. The first parser I ever wrote required O(2^n) time. (It was
recursive descent with backup. Note that the name of one of the very
best parsing algorithms is quite similar to the name of one of the
very worst. Many people get them confused. Don't be one of those
people.) My first parser worked just fine on programs of one or two
lines. When I had tested all parts of it on small programs, I gave it
a program of 50 lines. It never finished. Later I estimated that it
would have completed the parse within a few centuries, but that was
well beyond the rated MTBF of the available hardware and personnel.


A friend of mine once cut compilation time by 30% by rewriting the
symbol table routines to use shallow binding (constant time) instead
of logarithmic time (balanced trees). This mattered because her
company was compiling million-line programs night and day.


As for the rest of the compiler, many simple optimizations run in
linear time, at least in practice. Most control flow optimizations
require something like cubic time in the worst case, and are somewhat
worse than linear in practice. Most compilers run in close to linear
time at low optimization levels, and in low-degree polynomial time
with all optimizations enabled.


Optimizations that require much more than cubic time are seldom
implemented. Most journals will reject papers that propose
non-polynomial-time optimizations because they are usually quite
impractical. There are a few exceptions to this rule, but not many.


In short: Constant factors don't matter much except in the scanner.
Asymptotic complexity matters a great deal, in practice as well as in
theory.


Will
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.