Re: syntax complexity

anton@mips.complang.tuwien.ac.at (Anton Ertl)
Tue, 21 Feb 2023 08:14:10 GMT

          From comp.compilers

Related articles
[2 earlier articles]
Re: syntax complexity DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2023-02-16)
Re: syntax complexity gah4@u.washington.edu (gah4) (2023-02-16)
Re: syntax complexity gah4@u.washington.edu (gah4) (2023-02-16)
Re: syntax complexity costello@mitre.org (Roger L Costello) (2023-02-20)
Re: syntax complexity gah4@u.washington.edu (gah4) (2023-02-20)
Re: syntax complexity gneuner2@comcast.net (George Neuner) (2023-02-20)
Re: syntax complexity anton@mips.complang.tuwien.ac.at (2023-02-21)
syntax complexity christopher.f.clark@compiler-resources.com (Christopher F Clark) (2023-02-21)
Re: syntax complexity nmh@t3x.org (Nils M Holm) (2023-02-21)
Re: syntax complexity anton@mips.complang.tuwien.ac.at (2023-02-21)
Re: irregular expressions, syntax complexity arnold@freefriends.org (2023-02-22)
Re: ireegular expressions, syntax complexity anton@mips.complang.tuwien.ac.at (2023-02-22)
Re: irregular expressions, syntax complexity 864-117-4973@kylheku.com (Kaz Kylheku) (2023-02-23)
| List of all articles for this month |

From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Date: Tue, 21 Feb 2023 08:14:10 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: <AQHZRT1Nf7Ln0mpyG0extuZP9rnVPQ==> 23-02-045 23-02-047 23-02-050 <29156_1676600565_63EEE4F4_29156_1009_1_23-02-051@comp.compilers> 23-02-052 23-02-053
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="316"; mail-complaints-to="abuse@iecc.com"
Keywords: syntax, lex, comment
Posted-Date: 21 Feb 2023 12:38:32 EST

Our moderator writes:


>[There are definitely things that are hard to say in BNF, even though
>they're intuitively simple.


An example is the solution to the dangling-else problem that we
discussed some time ago. At least one of the language standards I
looked at during this discussion specified an ambiguous grammar for
the IF-statement, with the disambiguation coming from prose, rather
than specifying an unambiguous grammar.


Of course, better than either solution is to design the language to
require that an IF-statement is terminated with, e.g., fi (Algol 68) or END
(Modula-2).


>Another is floating point numbers with
>optional "." and "E" but you need at least one of them. -John]


Regular expression syntax is missing an operator that signifies the
intersection of the sets recognized by the operand regexps. Let's
call this operator "&". Then this requirement for an FP number can be
expressed as:


([0-9.]*&.*[0-9].*)(E[0-9]+)?&.*[.E].*


"[0-9.]*" specifies a lenient form of the mantissa part; ".*[0-9].*"
specifies a string that has at least one digit; so
"([0-9.]*&.*[0-9].*)" says that the mantissa part can contain digits
and "." and must contain one digit. "(E[0-9]+)?" specifies the
optional exponent part. So "([0-9.]*&.*[0-9].*)(E[0-9]+)" is a
lenient form of an FP number that may miss both "." and "E".
".*[.E].*" specifies the requirement stated above: The number must
contain a "." or an "E". Again the "&" combines these requirements.


You can translate regexps with & to a DFA, and don't lose any
performance there. For an NFA-based implementation, the only way I
see is that you process both sub-NFAs and check if they both accept
the string, so it's somewhat slower; I guess that this is a major
reason why such an operator has not been added to widely-used regexp
syntaxes.


Has anybody implemented such an operator at all?


- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[Wouldn't that pattern allow 1.2.3 ? -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.