Re: Parser error handling

haberg@matematik.su.se (Hans Aberg)
4 Sep 2003 22:41:04 -0400

          From comp.compilers

Related articles
Parser error handling piotr.wyderskiREMOVE@hoga.pl (Piotr Wyderski) (2003-08-20)
Re: Parser error handling haberg@matematik.su.se (2003-08-23)
Re: Parser error handling piotr.wyderskiREMOVE@hoga.pl (Piotr Wyderski) (2003-08-30)
Re: Parser error handling haberg@matematik.su.se (2003-09-04)
| List of all articles for this month |
From: haberg@matematik.su.se (Hans Aberg)
Newsgroups: comp.compilers
Date: 4 Sep 2003 22:41:04 -0400
Organization: Mathematics
References: 03-08-059 03-08-074 03-08-092
Keywords: parse, errors
Posted-Date: 04 Sep 2003 22:41:04 EDT

"Piotr Wyderski" <piotr.wyderskiREMOVE@hoga.pl> wrote:


>The [Bison] parser should free all allocated blocks to prevent memory leaks
>and then return an error value to the caller.
...
>> I assume you are using C as output language.
>
>No, also C++.
>
>> I use C++, where I write user classes which the C++ language can clean up.
>
>But how? It's easy when the root class implements a kind of linked
>list and tracks all the allocated fragments, but this increases memory
>usage.


Later Bison versions use the macro program M4 for postprocessing in
order to produce the output file, which makes it easy to write own
skeleton files. I made one such for C++ output; it is posted in the
Bison Patches list, with some details on how to use it.


In the C++ code, I use a class object which maintains a polymorphic
(virtual) pointer to a class object_root, also using a reference count
to avoid unnecessary copying. It is then easy to derive new classes
from the object_root class. The reference count works well with a
parser building an object.


With this setup I then cannot use the Bison %union static type checking
feature, because it uses a C/C++ union type as implementation, and that
cannot under C++ be used with types having non-trivial constructors. But I
do not feel I need %union either. As for the extra overhead, that is a
point with C++ to admit such overhead for programming convenience.


I will, at some length, describe the C++ setup I am using, because it
seems to be a recurrent question:


The Bison parser semantic type I use in a compiler I am currently writing on is:


class semantic_type {
public:
    long number_;
    std::string text_;
    my::object object_;


    semantic_type() : number(0) {}
};


#define YYSTYPE semantic_type


as it turns out that numbers and strings are very common to be used side
by side with various objects one is creating. The code for the object and
object_root classes are (which I post, because it is tricky to write
them):


class object_root {
    mutable unsigned count_;
public:
    object_root() : count_(1) {}
    virtual ~object_root() {}


    object_root* copy() const { ++count_; return const_cast<object_root*>(this); }
    void shed() { if (--count_ == 0) delete this; }


    virtual object_root* clone() const = 0;


    // Example virtual function:
    virtual void write(std::ostream&, write_style) const = 0;
};


class object {
protected:
    object_root* data_;


public:
    object_root* copy() const { return (data_ == 0)? 0 : data_->copy(); }
    void shed() { if (data_ != 0) data_->shed(); }


    object() : data_(0) {}
    ~object() { shed(); }


    object(const object& x) : data_(x.copy()) {}
    object& operator=(const object& x) {
        if (data_ != x.data_) { shed(); data_ = x.copy(); }
        return *this;
    }


    object(object_root* rp) : data_(rp) {}


    object_root* data() { return data_; }
    const object_root* data() const { return data_; }


    // Example function:
    void write(std::ostream& os, write_style ws) const {
        if (data_ == 0) { os << "none"; }
        else data_->write(os, ws);
    }
};


If you add more classes to the object_root hierarchy, they should have the form:


class X : public virtual object_root {
public:
    ...
    virtual object_root* clone() const { return new X(*this); }
    ...
};


The virtual copy constructor clone() is rarely used, though, because in a
parser the typical use is to instead create a new object rather than make
an exact copy of an old one.


One can also make derivations Y from the class object as well, typically
as a pair to a class X derived from object_root with some additional
virtual functions. Then the form I currently use is:


class Y : public virtual object {
public:
    Y() {}


    explicit Y(object_root* op) : object(op) {}
    explicit Y(const object& x) : object(x.copy()) {}


    Y(X* rp) : object(rp) {} // X derived from object_root


    X* data() { return dynamic_cast<X*>(data_); }
    const X* data() const { return dynamic_cast<const X*>(data_); }


    X* copy() const { return dynamic_cast<X*>(object::copy()); }
    ...
};


The point is that with this setup, one does not need to change the
semantic_type above, because will never ever maintain anything else but a
single pointer which can always be converted using suitable
dynamic_cast's.


An example of parser actions may then look like:


list:
        "[" list_body "]" { $$.object_ = $2.object_; }
;
list_body:
        list_object { $$.object_ = $1.object_; }
    | list_body "," list_object {
            $$.object_ = my::list($1.object_) + my::list($3.object_);
        }
;


assuming that one has a written a class my::list playing the role of class
Y above, which has an operator+() to concatenate lists. I would not write
a class Y, though, except when it simplifies the structure of the code.
For simple objects like a list, it is easier to derive a class list from
object_root only, with a constructor
        list::list(const object& x, const object& y)
that extracts the lists of objects x, y and concatenates them. Then the
parser action above becomes:


list_body:
        list_object { $$.object_ = $1.object_; }
    | list_body "," list_object {
            $$.object_ = new my::list($1.object_, $3.object_);
        }
;


Anyway, one can that way fairly easily build complicated objects, getting
the cleanup problem solved only at the cost of typical C++ overhead. If
you really want to cut down the overhead, then you should avoid dynamic
allocations altogether (malloc(), operator new()) altogether. This is what
the Hugs http://haskell.org/hugs interpreter does, implementing its own
GC. But then there is a considerable cost in implementation time in order
to achieve that. Then also C++ will not help you much; it is hard to
combine C++ with any GC other than a reference count.


    Hans Aberg


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.