Thu, 27 Apr 2023 00:28:55 -0000 (UTC)

Related articles |
---|

Deep expression chain performance borucki.andrzej@gmail.com (Andy) (2023-04-25) |

Re: Deep expression chain performance anton@mips.complang.tuwien.ac.at (2023-04-26) |

Re: Deep expression chain performance gah4@u.washington.edu (gah4) (2023-04-26) |

Re: Deep expression chain performance 864-117-4973@kylheku.com (Kaz Kylheku) (2023-04-27) |

Re: Deep expression chain performance anton@mips.complang.tuwien.ac.at (2023-04-27) |

From: | Kaz Kylheku <864-117-4973@kylheku.com> |

Newsgroups: | comp.compilers |

Date: | Thu, 27 Apr 2023 00:28:55 -0000 (UTC) |

Organization: | A noiseless patient Spider |

References: | 23-04-012 |

Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="23892"; mail-complaints-to="abuse@iecc.com" |

Keywords: | parse, performance |

Posted-Date: | 26 Apr 2023 20:42:41 EDT |

On 2023-04-25, Andy <borucki.andrzej@gmail.com> wrote:

*> I am writing C grammar.*

*> Grammar speed may be down caused by deep expression chain.*

*> For example, simple "n=0" has 20 levels:*

*> assignmentExpression*

*> conditionalExpression*

*> logicalOrExpression*

*> logicalAndExpression*

*> inclusiveOrExpression*

*> exclusiveOrExpression*

*> andExpression*

*> eqaulityExpression*

*> relationalExpression*

*> shiftExpression*

*> additiveExpression*

*> multiplicativeExpression*

*> castExpression*

*> unaryExpression*

*> postfixExpression*

*> postfixExpressionLeft*

*> atom*

*> literal*

*> IntegerConstant*

*> DeciamlConstant*

ISO C specifies its grammar in an obtusely verbose way, similar to the

above. The reason is that it makes the grammar unambiguous, meaning

that it does not require any annotation or external remarks about

ambiguities having to be resolved by a certain associativity or

precedence.

If you have some grammar processor which literally follows all those

productions, doing all of their reductions, then it will likely

not perform well.

I suspect even a LALR(1) compiled grammar will have an issue

there; I'm not aware that Yacc-like parser generators do any

collapsing of cascaded reductions which do nothing but { $$ = $1; }.

Here is a concrete example. Attached below is over 200 lines of

y.output from a Yacc implementation (Bison). In that y.output

we can trace the actions of the state machine and see what happens

when a the simple token LIT is reduced to an expression.

In State 0, when the parser sees a LIT, it will shift the token

and go to State 1. There, a reduction takes place via Rule 9,

and so we have a parser_expression on the top of the stack.

Then, back to State 0, where now have a primary_expression,

and so we go to state 8.

State 8, similarly to State 1, performs a reduction:

primary expression to mult_expression.

Back to State 0, where we have to dispatch a similar

thing again for multi_expression, then additive_expression,

then expression and finally top.

Thanks for raising this issue; I suspect it's a real performance

problem. Not only is a factored-out grammar hard to read but the

performance is bad due to useless back and forth transitions between

superfluous states to propagate reductions.

Above-referenced material follows:

Grammar

0 $accept: top $end

1 top: expression ';'

2 | expression ';' expression

3 expression: additive_expression

4 additive_expression: mult_expression

5 | mult_expression '+' additive_expression

6 mult_expression: primary_expression

7 | mult_expression '*' primary_expression

8 primary_expression: '(' expression ')'

9 | LIT

10 | IDENT

Terminals, with rules where they appear

$end (0) 0

'(' (40) 8

')' (41) 8

'*' (42) 7

'+' (43) 5

';' (59) 1 2

error (256)

LIT (258) 9

IDENT (259) 10

Nonterminals, with rules where they appear

$accept (10)

on left: 0

top (11)

on left: 1 2, on right: 0

expression (12)

on left: 3, on right: 1 2 8

additive_expression (13)

on left: 4 5, on right: 3 5

mult_expression (14)

on left: 6 7, on right: 4 5 7

primary_expression (15)

on left: 8 9 10, on right: 6 7

state 0

0 $accept: . top $end

LIT shift, and go to state 1

IDENT shift, and go to state 2

'(' shift, and go to state 3

top go to state 4

expression go to state 5

additive_expression go to state 6

mult_expression go to state 7

primary_expression go to state 8

state 1

9 primary_expression: LIT .

$default reduce using rule 9 (primary_expression)

state 2

10 primary_expression: IDENT .

$default reduce using rule 10 (primary_expression)

state 3

8 primary_expression: '(' . expression ')'

LIT shift, and go to state 1

IDENT shift, and go to state 2

'(' shift, and go to state 3

expression go to state 9

additive_expression go to state 6

mult_expression go to state 7

primary_expression go to state 8

state 4

0 $accept: top . $end

$end shift, and go to state 10

state 5

1 top: expression . ';'

2 | expression . ';' expression

';' shift, and go to state 11

state 6

3 expression: additive_expression .

$default reduce using rule 3 (expression)

state 7

4 additive_expression: mult_expression .

5 | mult_expression . '+' additive_expression

7 mult_expression: mult_expression . '*' primary_expression

'+' shift, and go to state 12

'*' shift, and go to state 13

$default reduce using rule 4 (additive_expression)

state 8

6 mult_expression: primary_expression .

$default reduce using rule 6 (mult_expression)

state 9

8 primary_expression: '(' expression . ')'

')' shift, and go to state 14

state 10

0 $accept: top $end .

$default accept

state 11

1 top: expression ';' .

2 | expression ';' . expression

LIT shift, and go to state 1

IDENT shift, and go to state 2

'(' shift, and go to state 3

$default reduce using rule 1 (top)

expression go to state 15

additive_expression go to state 6

mult_expression go to state 7

primary_expression go to state 8

state 12

5 additive_expression: mult_expression '+' . additive_expression

LIT shift, and go to state 1

IDENT shift, and go to state 2

'(' shift, and go to state 3

additive_expression go to state 16

mult_expression go to state 7

primary_expression go to state 8

state 13

7 mult_expression: mult_expression '*' . primary_expression

LIT shift, and go to state 1

IDENT shift, and go to state 2

'(' shift, and go to state 3

primary_expression go to state 17

state 14

8 primary_expression: '(' expression ')' .

$default reduce using rule 8 (primary_expression)

state 15

2 top: expression ';' expression .

$default reduce using rule 2 (top)

state 16

5 additive_expression: mult_expression '+' additive_expression .

$default reduce using rule 5 (additive_expression)

state 17

7 mult_expression: mult_expression '*' primary_expression .

$default reduce using rule 7 (mult_expression)

--

TXR Programming Language: http://nongnu.org/txr

Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Mastodon: @Kazinator@mstdn.ca

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.