Re: Best language for implementing compilers?

Bart <bc@freeuk.com>
Sun, 10 Mar 2019 15:33:14 +0000

          From comp.compilers

Related articles
[15 earlier articles]
Re: Best language for implementing compilers? bc@freeuk.com (Bart) (2019-03-09)
Re: Best language for implementing compilers? gneuner2@comcast.net (George Neuner) (2019-03-09)
Re: Best language for implementing compilers? gneuner2@comcast.net (George Neuner) (2019-03-09)
Re: Best language for implementing compilers? 157-073-9834@kylheku.com (Kaz Kylheku) (2019-03-10)
Re: Best language for implementing compilers? 157-073-9834@kylheku.com (Kaz Kylheku) (2019-03-10)
Re: Best language for implementing compilers? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-03-10)
Re: Best language for implementing compilers? bc@freeuk.com (Bart) (2019-03-10)
Re: Best language for implementing compilers? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-03-10)
Re: Best language for implementing compilers? gneuner2@comcast.net (George Neuner) (2019-03-10)
Re: Best language for implementing compilers? gneuner2@comcast.net (George Neuner) (2019-03-10)
Re: Best language for implementing compilers? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-03-11)
Re: Best language for implementing compilers? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-03-11)
Re: Best language for implementing compilers? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-03-12)
[2 later articles]
| List of all articles for this month |

From: Bart <bc@freeuk.com>
Newsgroups: comp.compilers
Date: Sun, 10 Mar 2019 15:33:14 +0000
Organization: virginmedia.com
References: 19-02-002 19-02-004 19-02-006 19-03-009
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="53992"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, parse
Posted-Date: 10 Mar 2019 21:07:39 EDT
In-Reply-To: 19-03-009
Content-Language: en-GB

On 10/03/2019 11:13, Christopher F Clark wrote:
> On Tuesday, February 12, 2019 at 9:43:46 AM UTC-5, Bart wrote:
>> On 08/02/2019 23:36, George Neuner wrote:
>>> On Fri, 8 Feb 2019 12:20:18 +0000, "Costello, Roger L."
>>> <costello@mitre.org> wrote:
>>>
>>>> What is it about ML that makes it such a good language for implementing
>>>> compilers?
>>>
>>> Compiling involves a lot of pattern matching, and pattern matching is
>>> a native feature of ML.
>>
>> You mean for tokenising and parsing? That would be a small part of
>> compilation (the easy bit, in my view), although it seems to be a
>> preoccupation of this group.
>
> This thread has gone down a rabbit hole. The pattern matching in ML is not
> used for tokenizing and parsing.
....


> If you want really fast tokenizing and parsing, you turn it into a DFA (PDA
> for a parser) and implement the engine for doing so in assembly
> language/machine code.


How fast are we talking about?


Using traditional methods on my not very fast PC, I can get up to 10
million lines per second for basic tokenising, 5Mlps with symbol table
lookups for identifiers (user names or keywords), and up to 2Mlps with
full parsing (building the symbol table and syntax tree on the way).


Which comes to an insignificant amount of the runtime of most compilers.


Parsing needs to recognise keywords in order to do its job; I can see
that tokenising and parsing could be combined into a single pass that
works a character at a time, but requiring machine generated parsing
code as it would otherwise be impractical. But what sort of speed-ups
could that give?


> All that said, the output of any decent
> C/C++ lexer and parser generator is often more than fast enough. That's
> despite lexing and parsing often taking upto a third of the compilation time.


On the same machine that I can parse sqlite3.c (a 210Kloc test +
windows.h) in about 0.14 seconds, a full 'gcc -O0 -c' compile takes 8.8
seconds. -O3 takes 54 seconds, nearly 400 times as long as parsing. (And
being C that includes dealing with macro expansion in the tokeniser,
include files, conditional code etc.)


[This test uses windows.h, and gcc uses a much more elaborate version
than my test. But a standalone compile of windows.h by gcc is 1.3
seconds with either -O0 or -O3, so is not too significant]


See the table here:


      https://github.com/sal55/qx/blob/master/compilertest.txt


This compares compilers when given 20K-2000K lines of the same input.
(Not realistic code, but still interesting to see how they managed.)


All you have to do is look at the difference between gcc 8.1.0
(unoptimised), and TinyC.


Both have exactly the same job to do in terms of parsing input, yet one
timing shows 16.5 seconds for gcc, and 0.2 seconds for Tiny C. That
means the job of parsing cannot have taken more than 0.2 seconds.


This is why I suggested it is not significant for most compilers. (It
might be for Tiny C, but getting that even faster can hardly be a
priority...)


(BTW my own compilers on that chart are 'bcc', 'MM', and 'QC', which are
mostly just behind Tiny C.)


>
> Notice, that I am not talking about the source code and lexing and parsing
> here.


That's why I asked the question.




> I'm talking about what you do afterwards. That's where the bulk of the
> work is done.


OK. Clearly my compilers work differently, or at least that aspect of it
is in a very crude form, as I don't recall needing to do any such
pattern matching.


Is this associated with optimising?


--
bart


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.