Re: How to parse and call c++ constructors?

A Pietu Pohjalainen <>
22 Sep 2005 23:41:09 -0400

          From comp.compilers

Related articles
How to parse and call c++ constructors? (Groleo) (2005-09-17)
Re: How to parse and call c++ constructors? (Detlef Meyer-Eltz) (2005-09-18)
Re: How to parse and call c++ constructors? (=?ISO-8859-1?Q?J=FCrgen_Kahrs?=) (2005-09-22)
Re: How to parse and call c++ constructors? (A Pietu Pohjalainen) (2005-09-22)
Re: How to parse and call c++ constructors? (Hans-Peter Diettrich) (2005-09-22)
Re: How to parse and call c++ constructors? (Groleo) (2005-09-22)
Re: How to parse and call c++ constructors? (Chris F Clark) (2005-09-23)
| List of all articles for this month |

From: A Pietu Pohjalainen <>
Newsgroups: comp.compilers
Date: 22 Sep 2005 23:41:09 -0400
Organization: University of Helsinki
References: 05-09-072
Keywords: parse
Posted-Date: 22 Sep 2005 23:41:09 EDT

Groleo <> wrote:
> Given a file,:
> A { B {
> text="cucu"
> } }

> it's parsed like:
> text=cucu
> B
> A.

I guess that you're using a bottom-up parser to do this, which causes
the problem.

> So, to show the actions too:
> A { <-- a =new A();
> B { <-- a->_b =new B();
> text="cucu" <-- a->_b->text = $2;
> }
> }
> So again this is translated into:
> a->_b->text = $2; //error: a was not allocated, b was not allocated
> a->_b =new B(); //error: a was not allocated.
> a =new A();// too late :)
> How can this be done without errors?

If you change to top-down parsing, the problem should go away. Lately
I've been thinking about OO-CFG's [1], which are suitable for recursive
descent parsing, and have the parser class structure following the
grammar of the parsed language.

In an OO-CFG, each production is either an 'AND'-rule (production
consists only of catenation of terminals and non-terminals) or an
'OR'-rule (only choosing between terminals and non-terminals). So,
if we write your problem as a OO-CFG, we get something like:

Identifier -> BlockID '{' BlockList '}'
BlockList -> BlockItem | empty
BlockItem -> IdentifierBlockList | AssignmentBlockList
IdentifierBlockList -> Identifier BlockList
AssignmentBlockList -> Assignment BlockList
Assignment -> VariableID '=' '"' STRING '"'
BlockID -> '[A-Z]*'
VariableID -> '[a-z]*'

It is easy to see that this grammar is LL(1), as it is right-recursive,
and each starter token (BlockID and VariableID) allows choosing of the
correct derivation.

Now, translation from this grammar to corresponding C++ class is
straightforward. For each production, take LHS of the production as the
class name, and make its constructor to parse its production.

So, the class Identifier constructor goes something like (pardon my
Java-style mixing):
    _blockID = accept("BlockID");
    _blockList = new BlockList();

BlockList constructor goes like:
    if(current_token == "}") {
        _blockList = null;
        return; // ending token of Identifier catenation
    } else if(current_token.match("[A-Z]*")) {
        _blockList = new IdentifierBlockList();
    } else if(current_token.match("[a-z]*")) {
        _blockList = new AssignmentBlockList();
    } else {

The rest of the classes go similarily. An interesting point is that the
'OR'-production is modelled via inheritance: a BlockItem IS-A
IdentifierBlockList _or_ it IS-A AssignmentBlockList.

Because the class Identifier shouldn't yet know the structure of the
chosen BlockList's subclass, it can only allocate space for the
BlockList, concrete class. For this reason, BlockList needs to hold a
reference to this same class, which is then referencing to instance of
its subclass. For more discussion about this problem, see [2].

Using the bridge pattern [3] you can then give access to the referenced
instance. If the right-recursiveness is too clumsy for your needs, then
you can patch the recursive descent parser to do arbitrary lookahead
and backtracking (with the expense of time and/or space).

With this solution, your parsing proceeds as follows (uh oh, this seems
a little bit confusing, but it isn't that important either):

Identifier.scan() -> "A"
Identifier._blockList = new BlockList();
BlockList._blockList = new IdentifierBlockList();
IdentifierBlockList._identifier = new Identifier();
Identifier.scan() -> "B"
Identifier._block_list = new BlockList();
BlockList._blockList = new AssignmentBlockList();
AssignmentBlockList._assignment = new Assignment();
Assignment._identifierID = scan(); -> "text"
Assignment._string = scan(); -> "cucu"
AssignmentBlockList._blockList = new BlockList();
BlockList.current_token == "}";
IdentifierBlockList._blockList = new BlockList()
BlockList.current_token == "}";
-- done;

On the other hand, when you're using bottom-up parsing, you could have
the parser generator to pass relevant data to your struct constructors.
See [4] for details.


[1]: Koskimies, K.: Object-orientation in attribute grammars. In:
Proceedings on Attribute Grammars, Applications and Systems, London,
UK, Springer-Verlag (1991), pp. 297-329
[2]: Grune, D., Bal, H.E., Jacobs, C.J.H., Langendoen, K.: Modern Compiler
Design. John Wiley (2002), Appendix A
[3]: Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns:
Elements of Reusable Object-Oriented Software.
Addison-Wesley, New York, (1995)
[4]: Holmes, J.: Object-oriented compiler construction. Prentice-Hall,
Inc., Upper Saddle
River, NJ, USA (1995)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.