Re: Best language for implementing compilers?

Christopher F Clark <christopher.f.clark@compiler-resources.com>
Sun, 10 Mar 2019 04:13:47 -0700 (PDT)

          From comp.compilers

Related articles
[14 earlier articles]
Re: Best language for implementing compilers? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-03-09)
Re: Best language for implementing compilers? bc@freeuk.com (Bart) (2019-03-09)
Re: Best language for implementing compilers? gneuner2@comcast.net (George Neuner) (2019-03-09)
Re: Best language for implementing compilers? gneuner2@comcast.net (George Neuner) (2019-03-09)
Re: Best language for implementing compilers? 157-073-9834@kylheku.com (Kaz Kylheku) (2019-03-10)
Re: Best language for implementing compilers? 157-073-9834@kylheku.com (Kaz Kylheku) (2019-03-10)
Re: Best language for implementing compilers? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-03-10)
Re: Best language for implementing compilers? bc@freeuk.com (Bart) (2019-03-10)
Re: Best language for implementing compilers? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-03-10)
Re: Best language for implementing compilers? gneuner2@comcast.net (George Neuner) (2019-03-10)
Re: Best language for implementing compilers? gneuner2@comcast.net (George Neuner) (2019-03-10)
Re: Best language for implementing compilers? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-03-11)
Re: Best language for implementing compilers? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-03-11)
[3 later articles]
| List of all articles for this month |

From: Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups: comp.compilers
Date: Sun, 10 Mar 2019 04:13:47 -0700 (PDT)
Organization: Compilers Central
References: 19-02-002 19-02-004 19-02-006
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="67223"; mail-complaints-to="abuse@iecc.com"
Keywords: parse
Posted-Date: 10 Mar 2019 09:50:58 EDT
In-Reply-To: 19-02-006

On Tuesday, February 12, 2019 at 9:43:46 AM UTC-5, Bart wrote:
> On 08/02/2019 23:36, George Neuner wrote:
> > On Fri, 8 Feb 2019 12:20:18 +0000, "Costello, Roger L."
> > <costello@mitre.org> wrote:
> >
> >> What is it about ML that makes it such a good language for implementing
> >> compilers?
> >
> > Compiling involves a lot of pattern matching, and pattern matching is
> > a native feature of ML.
>
> You mean for tokenising and parsing? That would be a small part of
> compilation (the easy bit, in my view), although it seems to be a
> preoccupation of this group.


This thread has gone down a rabbit hole. The pattern matching in ML is not
used for tokenizing and parsing. Unfortunately the term "pattern matching" is
used for a wide variety of things. Tokenizing and parsing being one of them.
It is also used in AI and Machine Learning circles to mean something
different. The network security folks have still a third use of the term.
However, in this context that's not what ML style pattern matching is used for
(and it is yet a fourth variation on what pattern matching means). There are
probably many other uses of the term I'm not aware of, but I know those four
and see how they overlap but are not the same. Each different form attacks a
different problem and uses a different meaning of the word pattern.


If you want really fast tokenizing and parsing, you turn it into a DFA (PDA
for a parser) and implement the engine for doing so in assembly
language/machine code. Tom Pennelo wrote an excellent paper on how to do
that. BTW, the code for an LL parser is slightly faster than the code for an
LR one, because it is less general and doesn't push as much stuff on the
stack. You pay for that speed with a slightly less general grammar formalism,
but in practice it doesn't matter. All that said, the output of any decent
C/C++ lexer and parser generator is often more than fast enough. That's
despite lexing and parsing often taking upto a third of the compilation time.
BTW, lexing (because it looks at every character) is the dominant factor in
that. (If you want to get faster, you have to find a way of dealing with
multiple characters at a time. There are papers on how to do that (mostly in
the context of network security), but for the lexing case it is a challenge to
do and I've never seen it used in general practice.)


Ok, having dispensed with that. Let's talk about ML style pattern matching
and where it is used and what it is used for.


Unless you have written a one-pass compiler, the output of your lexer/parser
combination is usually some form of syntax tree (either a parse tree or an
AST). That may or may not be a convenient intermediate representation (IR).
If it's not you need to "rewrite" it into one. Usually making one of more
passes over the tree to do so.


This is where ML style pattern matching shines. You have a tree (DAG, graph,
some sort of data structure that has links in it) and you want to modify it.
The ML pattern match allows you to specify the shape of the tree you want to
match and extract the relevant parts into convenient local variables. You can
then use those variables to construct a new tree that has the parts
reconnected (rewritten) into the shape you desire in your IR.


The technique is so good (i.e. easy to use and understand) that Terence Parr
built a whole tool (Sorcerer) to do just that to go along with his tool PCCTS
(aka ANTLR) so that you could do it in Java. I believe in modern versions, he
has merged both into one tool.


I think you see a similar approach in Ira Baxter's Semantic Design tool.


Notice, that I am not talking about the source code and lexing and parsing
here. I'm talking about what you do afterwards. That's where the bulk of the
work is done. You have structured data, but you want it in a different
structure. That's the problem that ML pattern matching helps you solve. It
isn't about lexing and parsing at all.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.