Related articles |
---|
Independent Study - Assistance? Alex@alexandermorou.com (Alexander Morou) (2009-02-15) |
Re: Independent Study - Assistance? cfc@shell01.TheWorld.com (Chris F Clark) (2009-02-16) |
Re: Independent Study - Assistance? Alex@alexandermorou.com (Alexander Morou) (2009-02-21) |
Re: Independent Study - Assistance? cfc@shell01.TheWorld.com (Chris F Clark) (2009-02-27) |
Re: Independent Study - Assistance? alexander.morou@gmail.com (Alexander Morou) (2009-02-28) |
From: | Alexander Morou <Alex@alexandermorou.com> |
Newsgroups: | comp.compilers |
Date: | Sat, 21 Feb 2009 08:41:32 -0600 |
Organization: | Compilers Central |
References: | <e45f4adf0902152017v2f7ce1c7g38ae34f7d42bc1b0@mail.gmail.com> |
Keywords: | lex, DFA |
Posted-Date: | 21 Feb 2009 18:55:36 EST |
Well,
> . . . truncated . . .
> 2) How to convert an NFA into a DFA. Subset contruction.
I think I might have possibly been able to get a good result out of
the lexer generator; however, I'm by no means an expert at this, so I
figured I'd ask a few of you guys to assist, if that's OK.
I decided that I'd use a very simple alphabet to verify the generator
handles merging cases where overlap occurs, regardless of the '+'s
that might be in play.
(('A'? 'B' 'C')+ | ('A' 'B'? 'D')+ | ('A' 'B' 'C'? 'D'?)+)+ 'A' 'B'?;
Since I prefer code that is readable, as opposed to sets of numbers
that leave me scratching my head, so it emits a simple state machine
for the token. I hope the list maintainer doesn't mind, but here's an
example of output:
/* -------------------------------------------------------------------\
| This code was generated by AllenCopeland.Abstraction.OIL. |
| Version: 1.0.0.0 |
|---------------------------------------------------------------------|
| To ensure the code works properly, |
| please do not make any changes to the file. |
|---------------------------------------------------------------------|
| The specific language is C# (Runtime version: v2.0.50727) |
| Sub-tool Name: AllenCopeland.Abstraction.OIL.CSharpCodeTranslator |
| Sub-tool Version: 1.0.0.0 |
\------------------------------------------------------------------- */
using System;
namespace AllenCopeland.Abstraction.Slf.Parsers.Test
{
#region AllenCopeland.Abstraction.Slf.Parsers.Test nested types
// Module: RootModule
public class TestLexStateMachine
{
#region TestLexStateMachine data members.
private int state;
private int length;
private int exitState;
private int exitlength;
#endregion // TestLexStateMachine data members.
#region TestLexStateMachine properties
public bool IsValidEndState
{
get
{
return (this.exitState != 0);
}
}
public int BytesConsumed
{
get
{
return this.exitlength;
}
}
#endregion // TestLexStateMachine properties
#region TestLexStateMachine methods
public bool Next(char @char)
{
switch (this.state)
{
case 0:
switch (@char)
{
case 'A':
goto MOVETOSTATE1;
case 'B':
goto MOVETOSTATE5;
}
break;
case 1:
switch (@char)
{
case 'B':
goto MOVETOSTATE2;
case 'D':
goto MOVETOSTATE7;
}
break;
case 2:
switch (@char)
{
case 'C':
goto MOVETOSTATE3;
case 'D':
goto MOVETOSTATE4;
}
break;
case 3:
switch (@char)
{
case 'A':
goto MOVETOSTATE9;
case 'B':
goto MOVETOSTATE5;
case 'D':
goto MOVETOSTATE8;
}
break;
case 4:
switch (@char)
{
case 'A':
goto MOVETOSTATE9;
case 'B':
goto MOVETOSTATE5;
}
break;
case 5:
switch (@char)
{
case 'C':
goto MOVETOSTATE6;
}
break;
case 6:
switch (@char)
{
case 'A':
goto MOVETOSTATE9;
case 'B':
goto MOVETOSTATE5;
}
break;
case 7:
switch (@char)
{
case 'A':
goto MOVETOSTATE9;
case 'B':
goto MOVETOSTATE5;
}
break;
case 8:
switch (@char)
{
case 'A':
goto MOVETOSTATE9;
case 'B':
goto MOVETOSTATE5;
}
break;
case 9:
switch (@char)
{
case 'B':
goto MOVETOSTATE10;
case 'D':
goto MOVETOSTATE7;
}
break;
case 10:
switch (@char)
{
case 'C':
goto MOVETOSTATE3;
case 'D':
goto MOVETOSTATE4;
}
break;
}
return false;
MOVETOSTATE1:
this.state = 1;
goto COMMON_STATEMOVE;
MOVETOSTATE2:
this.state = 2;
goto COMMON_STATEMOVE;
MOVETOSTATE3:
this.state = 3;
goto COMMON_STATEMOVE;
MOVETOSTATE4:
this.state = 4;
goto COMMON_STATEMOVE;
MOVETOSTATE5:
this.state = 5;
goto COMMON_STATEMOVE;
MOVETOSTATE6:
this.state = 6;
goto COMMON_STATEMOVE;
MOVETOSTATE7:
this.state = 7;
goto COMMON_STATEMOVE;
MOVETOSTATE8:
this.state = 8;
goto COMMON_STATEMOVE;
MOVETOSTATE9:
this.state = 9;
this.exitState = 9;
goto NOMINAL_EXIT;
MOVETOSTATE10:
this.state = 10;
this.exitState = 10;
goto NOMINAL_EXIT;
COMMON_STATEMOVE:
++this.length;
return true;
NOMINAL_EXIT:
this.exitlength = ++this.length;
return true;
}
public void Reset()
{
this.exitlength = 0;
this.length = 0;
this.exitState = 0;
this.state = 0;
}
#endregion // TestLexStateMachine methods
}
#endregion // AllenCopeland.Abstraction.Slf.Parsers.Test nested types
}
/* ----------------------------------------------\
| This file took 00:00:00.1921559 to generate. |
| Date generated: 2/21/2009 7:59:48 AM |
| There were 4 types used by this file |
| System.Int32, System.Boolean, System.Char, |
| System.Void |
|------------------------------------------------|
| There were 1 assemblies referenced: |
| mscorlib |
\---------------------------------------------- */
The only change I made was removing un-necessary breaks in the
switches; I didn't add code-flow analysis to that old code generator
so, it doesn't know when breaks are un-necessary.
I don't have the experience necessary to create a regular expression
that can show the system fault; however, I can say what it is not: It
isn't a regex parser generator, because it doesn't have look-behinds
or look-aheads, and notably by the source above: it doesn't do any
capturing (yet).
I wanted to get the logic right before I worried about 'special
features'. If you notice anything wrong with the above, please let me
know, again I've failed enough times at this to be cautiously
optimistic.
Amusingly the thing I kept screwing up on the previous attempts was
thinking 'one pass'. It wasn't until I realized that I was trying to
get deterministic code out of a non-deterministic setup. So I wrote a
converter which condensed the state sets into a DFA, single-state,
result. Once I realized that, it was much easier. I won't go into
what I did since, if you're reading this: you probably already know.
I think now that I've got it to this point, I can start some
simplistic analysis on the data-sets, based upon the code above, I can
search for cases where the transition (char->state) set are equal to
one another, I can condense the states together into one.
If testing the system that generated the code above is preferable,
please reply, or e-mail me.
Thanks,
-Alexander [Allen C. Copeland Jr.] Morou
Return to the
comp.compilers page.
Search the
comp.compilers archives again.