Re: Best language for implementing compilers?

Bart <bc@freeuk.com>
Wed, 13 Mar 2019 01:50:08 +0000

          From comp.compilers

Related articles
[23 earlier articles]
Re: Best language for implementing compilers? gneuner2@comcast.net (George Neuner) (2019-03-10)
Re: Best language for implementing compilers? gneuner2@comcast.net (George Neuner) (2019-03-10)
Re: Best language for implementing compilers? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-03-11)
Re: Best language for implementing compilers? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-03-11)
Re: Best language for implementing compilers? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-03-12)
Re: Best language for implementing compilers? mertesthomas@gmail.com (2019-03-12)
Re: Best language for implementing compilers? bc@freeuk.com (Bart) (2019-03-13)
| List of all articles for this month |

From: Bart <bc@freeuk.com>
Newsgroups: comp.compilers
Date: Wed, 13 Mar 2019 01:50:08 +0000
Organization: virginmedia.com
References: 19-02-002 19-02-004 19-02-006 19-03-009 19-03-010 19-03-015 19-03-016
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="69465"; mail-complaints-to="abuse@iecc.com"
Keywords: performance
Posted-Date: 13 Mar 2019 03:23:17 EDT
In-Reply-To: 19-03-016
Content-Language: en-GB

On 12/03/2019 05:54, Hans-Peter Diettrich wrote:
> Am 11.03.2019 um 18:49 schrieb Christopher F Clark:


>> I haven't measured in a long time, so I can't quote any numbers.
>> However, as
>> I recall, you can lex a buffer in roughly the same time you can access
>> it via
>> getc rather than reading with fread if your lexer code is tight.  In
>> fact, the
>> fetching of the characters is often a significant factor in the lexing
>> time.
>> The other significant factors are the time spent in calls (to either
>> the I/O
>> library or passing a token back to the parser.  So, really fast lexers
>> actually often concentrate on that, minimizing both (e.g. reading large
>> buffers and batching up a whole set of tokens to pass to the parser
>> rather
>> than one at a time).
>
> In the age of multi-core processors and threads some parallel work can
> reduce the overall processing time. Then the longest running part of
> the compiler determines the total run time, not the sum of all times.
>
> With sufficiently large memory it's possible to read (or map) entire
> files into RAM, so that library function calls for reading characters
> are not required any more.


That's been the case for a very long time. My sqlite3.c test file, a
large 210K line file is 8MB; my PC has 8000MB of RAM. So loading such a
large file occupies 0.1% of the memory.


Loading that file takes 9ms on my machine, although thanks to file
caching. (Without caching, then it's up to how efficiently the OS can
fetch files into memory, but that's outside the scope of the compiler,
and not something it can do much about.)


Anyway, once in memory, scanning the characters involves traversing the
source by incrementing a byte pointer. That part of my lexers usuaily
looks like this (this one designed for C source):


        doswitch lxsptr++^ # (looping switch)
        when 'A'..'Z','a'..'z','$','_' then
              .... start of identifier
        when '1'..'9' then
              .... start of decimal number


> With all the caches used by nowadays OSs it's hard to reproduce
> benchmark times. And that's not always really required or desireable!


I would dispute that file load times should be considered part of
compilation time, at least when comparing performance.


Imagine a compiler accessed via a library API, where you pass it a
string as the input source, and it returns the output as another string.
File i/o doesn't come into it. Or maybe the input was synthesised or
generated from another program.


> Imagine a fast compiler that is invoked after every single change to
> the source code,


Both of my own languages do have whole program compilers that /must/
process all modules on every change. However, my projects are small
enough (20-40K lines over a few dozen modules), that it might take 0.2
to 0.3 seconds total elapsed time. (There is some scope for further
improvement, but I don't need it at the moment.)


(Actually, lexing and parsing probably /is/ about 30% of my compile
times, but only because the compilers are generally quite fast. The
byte-code compiler has touched on a million lines per second, on a
older, somewhat faster machine.


But at that level it becomes difficult to test compiler speed on real
programs because the actual timing gets lost in the noise; just printing
a few more lines of output might take as long!


One older compiler was written in a dynamic language which had to be
compiled to byte-code. The performance of that compiler was indifferent,
but the one used to generate the byte-code was blazing fast. In fact,
for a while it was set up so that every time I ran this compiler, it was
compiled from scratch (some 24Kloc).


I didn't notice, since it only took some tens of milliseconds. That;s
not something you can attempt with gcc (rebuilding it every time it's run).)


    which will benefit from OS caches, whereas a slow
> compiler invoked once per hour or day will suffer even more from the
> lack of cached files and directories. A clever IDE can do such caching
> itself, and can remember which *parts* of a source file have not been
> touched since the last compile, much bettter than the OS file
> modification date.  And it can compile updates in the background, so
> that a final compilation of an entire project may run as fast as the
> compilation summary is presented to the user :-)


Yeah, but that would be misleading the user as to the real compiler
performance...


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.