Related articles |
---|
Preview chapter about the structured syntax definition of Seed7 thomas.mertes@gmx.at (2008-12-02) |
From: | thomas.mertes@gmx.at |
Newsgroups: | comp.compilers |
Date: | Tue, 2 Dec 2008 22:59:23 -0800 (PST) |
Organization: | Compilers Central |
Keywords: | design |
Posted-Date: | 04 Dec 2008 14:58:55 EST |
Hello,
I am writing a chapter about the structured syntax definition
of Seed7. It would be nice to get some feedback about it before
the final release. Here is it (read it with a monospaced font):
9. STRUCTURED SYNTAX DEFINITION
===============================
Many languages have predefined constructs like statements and
operators. This constructs have fixed syntax and semantics. A
natural language or some meta language is used to describe the
syntax and semantics of the constructs. Mostly the programmers
are not allowed to define new syntactic constructs. But sometimes
a full description of the syntax and semantics of a construct
written in the programming language itself is helpful. This
avoids the need to use ambiguous natural language descriptions
and it is also unnecessary to learn an additional meta language.
A formal description of new constructs in the language itself
gives also the opportunity to define new constructs. Note that a
compiler-compiler does not offer this opportunity and has also a
meta language.
There are different notations to specify the syntax of
programming languages. The syntax statements of Seed7 use a
notation called Seed7 Structured Syntax Description (abbreviated
with S7SSD). The Backus-Naur Form (BNF) and its variants like the
Extended Backus-Naur Form (EBNF) are another example of such a
syntax specification. Since it is easier to understand new
concepts when they are compared to well known concepts, the EBNF
will be used as base to explain the S7SSD.
9.1 The Extended Backus-Naur Form
As the name says the Extended Backus-Naur Form is an extension
of the BNF. The extensions allow the definition of repetitions and
optional parts without the use of recursive definitons. The EBNF
has the following elements:
- Nonterminal symbols consist of lower case letters and
underline characters.
- Terminal symbols are quoted strings or names in upper case
characters which describe unprintable characters (control
characters).
- The concatenation of nonterminal and/or terminal symbols is
described by writing them in sequence.
- With | two alternatives can be separated.
- Expressions of the extended Backus-Naur form can be put under
parenthesis.
- When a subexpression is optional it is enclosed in squared
brackets [ ... ] .
- When a subexpression may be omitted or repeated it is
enclosed in curly braces { ... } .
The syntax of the extended Backus-Naur form can be described with
extended Backus-Naur form.
syntax_description ::=
{ statement } .
statement ::=
identifier '::=' expression '.' .
expression ::=
term { '|' term } .
term ::=
factor factor .
factor ::=
identifier | string | '(' expression ')' |
'[' expression ']' | '{' expression '}' .
9.2 The syntax of a statement
To explain the Seed7 Structured Syntax Description we propose
that a new statement, the 'loop' statement, should be defined. A
'loop' statement is similar to 'while' and 'repeat' loops but
instead of having the conditional exit at the beginning or at the
end it's conditional exit is in the middle of the loop. This
middle conditional exit is part of the 'loop' statement. Therefore
it should not be confused with the 'exit' statement that exists in
some programming languages. An example of the 'loop' statement is:
loop
ch := getc(inFile);
until ch = '\n' do
stri &:= str(ch);
end loop;
An equivalent solution without the usage of the 'loop' statement
would be:
repeat
ch := getc(inFile);
if ch <> '\n' then
stri &:= str(ch);
end if;
until ch = '\n';
The S7SSD of the 'loop' statement is:
$ syntax expr: .loop.().until.().do.().end.loop is -> 25;
For now we concentrate at the heart of the S7SSD, the expression:
.loop.().until.().do.().end.loop
For the purpose of the syntax description we can just remove the
dots, which gives:
loop () until () do () end loop
With EBNF the 'loop' statement can be described as:
loop_statement ::=
'loop'
statement
'until' expression 'do'
statement
'end' 'loop' .
An EBNF description may use many nonterminal symbols such as
'statement' or 'expression'. The S7SSD does not distinguish
between different nonterminal symbols. Instead the S7SSD only
knows one nonterminal symbol: ()
Therefore the S7SSD cannot distinguish between 'statement',
'expression' or something else. At the syntax level any kind of
expression can by substituted for a S7SSD nonterminal symbol ().
With EBNF it is possible to describe constraints such as the type
of an expression. The S7SSD relies on semantic checks to verify
such constraints. An expression like
loop
35
until 1+2 do
integer
end loop
would be legal given the S7SSD of the 'loop' statement:
$ syntax expr: .loop.().until.().do.().end.loop is -> 25;
The definition of the semantic of the 'loop' statement is:
const proc: loop
(in proc: statements1)
until (ref func boolean: condition) do
(in proc: statements2)
end loop is func
local
var boolean: exitLoop is FALSE;
begin
repeat
statements1;
if not condition then
statements2;
else
exitLoop := TRUE;
end if;
until exitLoop;
end func;
This semantic definition determines which types are accepted
between the keywords. For the 'loop' example with the semantic
errors (see above) you would get an error message like:
*** chkloop.sd7(35):51: Match for {loop 35 until {1 + 2 } do
integer end loop } failed
9.3 Priority and assoziativity
When a syntax construct has parameters before the first symbol
or after the last symbol the priority and the associativity of the
construct are significant. Constructs with stronger priority bind
their parameters earlier than constructs with weaker priority. The
priority is described by a natural number (inclusive 0). The
strongest priority is 0. Weaker priorities are described by larger
numbers. What bind means is can be declared with an example:
=
A + B = C * D / \
/ \
* priority 6 + *
+ priority 7 / \ / \
= priority 12 A B C D
First the * takes its parameters, then the + and at last the =
follows.
The associativity describes, in which order constructs with equal
priority bind their parameters. For example
A - B - C
can be interpreted in two ways:
(A - B) - C or A - (B - C)
There are four associativities possible:
Symbol
Binding from left to right ->
Binding from right to left <-
Neither the left nor the right parameter
are allowed to have the same priority <->
At the left side there is a binding from
left to right and at the right side there
is a binding from right to left -><-
The last two possibilities give no legal interpretation in the
subtraction example. The third kind of assiciativity ( <-> ) is
used by the equal operator ( = ) of Pascal because there a
expression like
A = B = C
is not legal.
There is a second way to describe the associativity. The
associativity describes if an operand must have a stronger
priority than the priority of the operator. For example:
- 7
A - B - C / \ / \
/ \ <=7 / \ <7
- priority 7 -> / \ / \
- C 7 0
/ \ / \
/ \ <=7 / \ <7
/ \ / \
A B 0 0
The numbers in the nodes of the right tree show the priority of
each sub expression (sub tree). With < and <= the required
condition for the priority of an operand is described. An
interpretation is legal if all this conditions are met. If there
are more than one legal interpretations or no legal
interpretation the expression is illegal.
Table for the possibilities of associativity:
+---------------+------------------------------+
| associativity | The priority of the |
+---------------+--------------+---------------+
| | left operand | right operand |
| | must be | must be |
+---------------+--------------+---------------+
| -> | <= | < |
| <- | < | <= |
| <-> | < | < |
| -><- | <= | <= |
+---------------+--------------+---------------+
| | than that of the operator |
+---------------+------------------------------+
The parameter before the operator symbol is called left operand.
The parameter after the last symbol of a construct is called
right operand. In case of normal operators the last symbol of a
construct and the operator symbol are identical. If this is not
the case there is a third kind of operand. Between the operator
symbol and the last symbol of a construct are the middle operands.
Middle operands can have any priority.
9.4 The syntax of operators
The S7SSD treats everything as operator description. Operators
have priority and assoziativity. The priority and associativity
determine in which succession the S7SSD syntax rules get applied.
To explain priority and assoziativity we use the basic arithmetic
operations (+,-,*,/). To describe them with EBNF you can write:
factor :=
number | name .
expression_5 ::=
factor |
( '+' expression_5 ) |
( '-' expression_5 ) .
expression_6 ::=
expression_5 |
( expression_6 '*' expression_7 ) |
( expression_6 '/' expression_7 ) .
expression_7 ::=
expression_6 |
( expression_7 '+' expression_6 ) |
( expression_7 '-' expression_6 ) .
This describes the following things:
- The operators have different priorities:
- Plus and minus signs are executed first
- Multiplication and division are executed second.
- Addition and subtraction are executed last.
- This priorities are exactly what you expect from an
arithmetic expression.
- Additionally you see that ++2 is allowed and interpreted as
+(+(2)) which means that the plus sign is a right-associative
operator.
- You can also see that a*b*c is allowed and interpreted as
(a*b)*c which means that the multiplication is a
left-associative operator.
All this things can also be described with S7SSD:
$ syntax expr: . + .() is <- 5;
$ syntax expr: . - .() is <- 5;
$ syntax expr: .(). * .() is -> 6;
$ syntax expr: .(). / .() is -> 6;
$ syntax expr: .(). + .() is -> 7;
$ syntax expr: .(). - .() is -> 7;
As you can see the S7SSD is shorter as the description with EBNF.
A syntax statement is expained as follows:
- The $ is used to introduce all hardcoded statements.
- The keyword 'syntax' introduces a structured syntax
description.
- The result of the recognized expression will have
the type 'expr'. The type 'expr' is used between the syntax
and the semantic analysis. The type 'expr' describes
expressions which are syntactically analyzed but not
semantically analyzed. After the semantic analysis (and
during the runtime) the type 'expr' is not used.
- The colon ':' is used as separator between type and syntax
description.
- A dot expression like '.(). * .()' is introduced (as can
probably be guessed by the name) with a dot. For the purpose
of the syntax description you can just remove the dots in
your mind: '() * ()'
- The symbol 'is' is used in all Seed7 declarations as
separator between the name and the value.
- The associativity is described with one of the symbols
-> (left-associative), <- (right-associative),
<-> (not associative) and -><- (both associativitys).
When there are no left or right operands, as it is the case
for the 'loop' statement, the associativity is irrelevant.
- Finally the priority of the syntax construct is defined with
a integer literal like '6'. The priority '6' is used for the
operators '*', '/', 'div', 'rem', 'mdiv' and 'mod'.
The S7SSD can also be easily used to do automatic syntax
recognition.
There are also other things which are out of the scope of the
S7SSD. The syntax of tokens (whitespace, comments, identifiers and
literals) and expressions (parentheses, function calls and dot
expressions) is hardcoded. The hardcoded constructs are described
in chapter 10 (Tokens) and chapter 11 (Expressions).
For the reasons mentioned above it is not possible to transform
every EBNF syntax description into S7SSD. Transforming S7SSD
descriptions to EBNF is always possible.
The advantage of the S7SSD lies in its simplicity and that a fast
automated syntax recognition algorithm can be easily implemented.
It is exactly the combination of hardcoded syntax recognition and
flexible syntax rules that make it successful.
9.5 Comparison of EBNF and S7SSD
In the S7SSD of the 'loop' statement
$ syntax expr: .loop.().until.().do.().end.loop is -> 25;
are no nonterminal expressions '()' before the first keyword or
after the last keyword. Therefore the assoziativity does not play
any role. The nonterminal expressions '()' of the 'loop' statement
are all surrounded by keywords and therefore they can have any
priority. As priority of the 'loop' 25 is choosen just because
most other statements have also priority 25. The assignments
(:= +:= *:= ...) have priority 20 and all operators used in
arithmetic, boolean and string expressions have priorities less
than 20. BTW: The semicolon operator (;) is defined with the
priority 50. Operators with a priority of 0 get their parameters
before operators with priority 1 and so on.
The corresponding EBNF description of the 'loop' statement would
be:
expression_25 ::=
'loop'
expression_127
'until' expression_127 'do'
expression_127
'end' 'loop' .
You must keep in mind that alternative rules for expression_25 are
also possible and that for every priority level a rule like
expression_127 ::= expression_126 .
is defined. Additionally the following rules are defined:
expression_0 ::= token | parentheses_expression |
call_expression | dot_expression .
token ::=
identifier | literal .
parentheses_expression ::=
'(' expression_127 ')' .
call_expression ::=
expression_127 [ '('
[ expression_127 { ',' expression_127 } ]
')' ] .
dot_expression ::=
[ '.' ] call_expression { '.' call_expression } .
When you want to use some special syntax which should be only
allowed at some place you do the following:
- Define the special syntax with S7SSD in a way that does not
contradict with the rest of the syntax definitions.
- Use semantic definitions to make sure that this syntax
construct can only be used at the place desired.
The 'elsif' and 'else' parts of the 'if' statement use this
technic. The syntax of the 'elsif' and 'else' statement parts is:
$ syntax expr: .elsif.().then.() is <- 60;
$ syntax expr: .elsif.().then.().() is <- 60;
$ syntax expr: .else.() is <- 60;
The types 'ELSIF_RESULT' and 'ELSIF_PROC' are just defined to be
usable for the 'elsif' and 'else' parts. A special variant of the
'if' statement is defined with a parameter of type 'ELSIF_PROC':
$ syntax expr: .if.().then.().().end.if is -> 25;
const proc: if (in boolean param) then
(in proc param)
(in ELSIF_PROC param)
end if is action "PRC_IF_ELSIF";
With 'action' the primitive actions are used (instead of a high
level definition written in Seed7). Primitive actions are
explained here: http://seed7.sourceforge.net/manual/actions.htm
The semantic part of the 'else' and 'elsif' part are defined with:
const ELSIF_PROC: elsif (in boolean param) then
(in proc param) is action "PRC_IF";
const ELSIF_PROC: elsif (in boolean param) then
(in proc param)
(in ELSIF_PROC param) is action "PRC_IF_ELSIF";
const ELSIF_PROC: else
(in void param) is ELSIF_EMPTY;
Since no other functions of type 'ELSIF_PROC' are defined only
legal 'if' statements can be written.
An explanation of the 'if' statement is here
http://seed7.sourceforge.net/manual/stats.htm#if-statement
================================
Thanks in advance for your effort.
Greetings Thomas Mertes
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.