Re: ML-style pattern matching in C-like languages

torbenm@app-2.diku.dk (=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=)
15 Dec 2005 17:53:02 -0500

          From comp.compilers

Related articles
ML-style pattern matching in C-like languages rsc@swtch.com (Russ Cox) (2005-12-15)
Re: ML-style pattern matching in C-like languages just-for-news-frido@q-software-solutions.de (Friedrich Dominicus) (2005-12-15)
Re: ML-style pattern matching in C-like languages gtg983q@mail.gatech.edu (Ben Chambers) (2005-12-15)
Re: ML-style pattern matching in C-like languages torbenm@app-2.diku.dk (2005-12-15)
Re: ML-style pattern matching in C-like languages RLake@oxfam.org.uk (2005-12-15)
Re: ML-style pattern matching in C-like languages rvclayton@acm.org (2005-12-15)
Re: ML-style pattern matching in C-like languages haberg@math.su.se (2005-12-19)
Re: ML-style pattern matching in C-like languages Juergen.Kahrs@vr-web.de (=?ISO-8859-1?Q?J=FCrgen_Kahrs?=) (2005-12-19)
Re: ML-style pattern matching in C-like languages rsc@swtch.com (Russ Cox) (2005-12-23)
Re: ML-style pattern matching in C-like languages nr@eecs.harvard.edu (2005-12-29)
[1 later articles]
| List of all articles for this month |

From: torbenm@app-2.diku.dk (=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=)
Newsgroups: comp.compilers
Date: 15 Dec 2005 17:53:02 -0500
Organization: Department of Computer Science, University of Copenhagen
References: 05-12-036
Keywords: C, design

Russ Cox <rsc@swtch.com> writes:


> Have there been any proposals or prototypes built
> for introducing some form of ML-style pattern matching
> in C-like languages?


I can't say, but it seems like a good idea.


> I am working on a C compiler written in a malleable
> dialect of C and have no qualms about adding new
> syntax to the language to make writing the compiler
> easier. I am toying with the idea of adding some form
> of pattern-matching syntax but am not sure exactly
> what it should look like.
>
> One issue is that C has no explicit support for
> discriminated unions, but I am willing to assume that there
> are hints of some form to tell the compiler how to interpret
> the patterns w.r.t. the structure elements.


You could make a union of structs that all start with a tag field that
you can switch on.


> Another issue is that my structures have much more
> information in them than I care to match on. For example,
> the generic abstract syntax node might look something like:
>
> struct Node
> {
> int op; /* OADD, OSUB, OMUL, ONEG, etc. */
> Node *left; /* left child in a.s. tree */
> Node *right; /* right child in a.s. tree */
> Line line; /* source line information */
> Type *type; /* C type assigned to expression */
> };
>
> If I want to pattern match on this, sometimes I might care
> about the type and sometimes not. I never care about the
> source line number. Sometimes I don't even care about the
> right child. In the real struct Node there are many more fields,
> any of which I only rarely care about.
>
> One might envision a syntax like:
>
> if(n ~ Node(OADD, Node(OMUL, ?x, ?y), ?z))
>
> or even
>
> if(n ~ OADD(OMUL(?x, ?y), ?z))
>
> to match (and bind subexpressions in) an expression of
> the form x*y+z. But maybe sometimes I care about the types:
>
> if(n ~ OADD(OMUL(?x, ?y, .type ~ TLONG), ?z))
>
> where the the multiplication has result type "long". And so on.


Given that it is C, I would probably want a struct-like syntax, but
which allows some fields to be omitted (and the order changed).
Something like:


    match (n) {
        {op == OADD, *left = {op == OMUL, left = x, right = y}, right = z} : e1
        {...} : e2
        ...
    }


The == operator indicates that the field value must be equal to the
value at the right of the operator, while the = operator indicates
that the field value must match the pattern on the right. The *
indicates that it is the value pointed to by the field rather than the
field itself that is matched. You don't have to mention all field
names in the pattern. Unmentioned fields are ignored when matching.


You can translate this into an if-statement:


    tmp1 = n;
    if (tmp1.op == OADD) {
        int z = tmp1.right;
        tmp2 = *left;
        if (tmp2.op == OMUL) {
            int x = tmp2.left;
            int y = tmp2.right;
            e1;
            goto end_of_match;
        }
    }
    (code for other cases)
    end_of_match:


> How do ML programmers who write real compilers deal
> with this issue (auxiliary data that never or rarely is
> interesting in the match)?


I mostly collect all such rarely used information (such as position in
the source code) into a single record or tuple that can be matched
using a single wildcard pattern (i.e., "_") if not used and matched
with a suitable pattern to extract the relevant part when I do need
it.


> Have there been pattern-matching syntaxes in other languages
> that deal with this problem?


In Standard ML, record patterns can use "..." to signify that the
remaining fields are ignored.


                Torben


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.