Context sensitive tokens

Christopher F Clark <>
Sun, 1 Mar 2020 20:14:08 +0200

          From comp.compilers

Related articles
Context sensitive tokens (Christopher F Clark) (2020-03-01)
| List of all articles for this month |

From: Christopher F Clark <>
Newsgroups: comp.compilers
Date: Sun, 1 Mar 2020 20:14:08 +0200
Organization: Compilers Central
Injection-Info:; posting-host=""; logging-data="61940"; mail-complaints-to=""
Keywords: lex
Posted-Date: 01 Mar 2020 13:16:52 EST

The discussion on tokens that are substrings of other tokens got me
thinking about a feature that might help make such tokens easier to
specify. I am now looking for a name (keyword) to use to describe
these tokens.

In particular consider the case of ">>" v. ">" in C++ templates. In
expression contexts, you want >> to return the "right shift operator"
token, but in template contexts you want it to return each ">" as an
"end of template angle bracket" token. You can do this with lexer
states. But, the more of these you have, the more lexer states you
get and combinatorial explosion sets in. Not desirable, especially if
you are creating the lexer states by hand.

An alternate solution (that seems nice and simple to me) is to have
flags associated with the problematic tokens that you want returned
only in some states and not others. Where the lexer queries the
parser to determine which tokens are allowed and only returns one from
the allowable set.

So normally, in Yacc++, one would write:

token greater_than : ">";
token right_shift : ">>";

But, since we want the right shift token to be context sensitive. We
would instead write.

token greater_than : ">";
context sensitive token right_shift : ">>";

Now, before returning a right_shift token, it queries the parser as to
whether that is legal in the current parser state. It would be an
array of bits indicating which ones were legal that the parser would
toggle to indicate whether the token was legal or not. (The parser
knows for each state, what tokens are expected, so the bit mask is not
hard to generate. And the only reason to do this only for some tokens
is to make syntax error discovery easier, by not turning all
unexpected tokens into lexical syntax errors.) If not, the lexer
would return a different sequence of tokens (e.g. just a greater_than
token, since that was the longest match prior to this disallowed
match). The actual implementation is a little more subtle than that,
but that captures the idea.

The main question I have is what keyword(s) I should use to indicate
the tokens in question.

context sensitive
expectation sensitive

something else?

Chris Clark email:
Compiler Resources, Inc. Web Site:
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.