Re: Independent Study - Assistance?

Alexander Morou <Alex@alexandermorou.com>
Sat, 21 Feb 2009 08:41:32 -0600

          From comp.compilers

Related articles
Independent Study - Assistance? Alex@alexandermorou.com (Alexander Morou) (2009-02-15)
Re: Independent Study - Assistance? cfc@shell01.TheWorld.com (Chris F Clark) (2009-02-16)
Re: Independent Study - Assistance? Alex@alexandermorou.com (Alexander Morou) (2009-02-21)
Re: Independent Study - Assistance? cfc@shell01.TheWorld.com (Chris F Clark) (2009-02-27)
Re: Independent Study - Assistance? alexander.morou@gmail.com (Alexander Morou) (2009-02-28)
| List of all articles for this month |

From: Alexander Morou <Alex@alexandermorou.com>
Newsgroups: comp.compilers
Date: Sat, 21 Feb 2009 08:41:32 -0600
Organization: Compilers Central
References: <e45f4adf0902152017v2f7ce1c7g38ae34f7d42bc1b0@mail.gmail.com>
Keywords: lex, DFA
Posted-Date: 21 Feb 2009 18:55:36 EST

Well,


> . . . truncated . . .
> 2) How to convert an NFA into a DFA. Subset contruction.


I think I might have possibly been able to get a good result out of
the lexer generator; however, I'm by no means an expert at this, so I
figured I'd ask a few of you guys to assist, if that's OK.


I decided that I'd use a very simple alphabet to verify the generator
handles merging cases where overlap occurs, regardless of the '+'s
that might be in play.


(('A'? 'B' 'C')+ | ('A' 'B'? 'D')+ | ('A' 'B' 'C'? 'D'?)+)+ 'A' 'B'?;


Since I prefer code that is readable, as opposed to sets of numbers
that leave me scratching my head, so it emits a simple state machine
for the token. I hope the list maintainer doesn't mind, but here's an
example of output:


  /* -------------------------------------------------------------------\
  | This code was generated by AllenCopeland.Abstraction.OIL. |
  | Version: 1.0.0.0 |
  |---------------------------------------------------------------------|
  | To ensure the code works properly, |
  | please do not make any changes to the file. |
  |---------------------------------------------------------------------|
  | The specific language is C# (Runtime version: v2.0.50727) |
  | Sub-tool Name: AllenCopeland.Abstraction.OIL.CSharpCodeTranslator |
  | Sub-tool Version: 1.0.0.0 |
  \------------------------------------------------------------------- */
using System;


namespace AllenCopeland.Abstraction.Slf.Parsers.Test
{
        #region AllenCopeland.Abstraction.Slf.Parsers.Test nested types
        // Module: RootModule
        public class TestLexStateMachine
        {
                #region TestLexStateMachine data members.
                private int state;


                private int length;


                private int exitState;


                private int exitlength;
                #endregion // TestLexStateMachine data members.
                #region TestLexStateMachine properties
                public bool IsValidEndState
                {
                        get
                        {
                                return (this.exitState != 0);
                        }
                }


                public int BytesConsumed
                {
                        get
                        {
                                return this.exitlength;
                        }
                }
                #endregion // TestLexStateMachine properties
                #region TestLexStateMachine methods
                public bool Next(char @char)
                {
                        switch (this.state)
                        {
                                case 0:
                                        switch (@char)
                                        {
                                                case 'A':
                                                        goto MOVETOSTATE1;
                                                case 'B':
                                                        goto MOVETOSTATE5;
                                        }
                                        break;
                                case 1:
                                        switch (@char)
                                        {
                                                case 'B':
                                                        goto MOVETOSTATE2;
                                                case 'D':
                                                        goto MOVETOSTATE7;
                                        }
                                        break;
                                case 2:
                                        switch (@char)
                                        {
                                                case 'C':
                                                        goto MOVETOSTATE3;
                                                case 'D':
                                                        goto MOVETOSTATE4;
                                        }
                                        break;
                                case 3:
                                        switch (@char)
                                        {
                                                case 'A':
                                                        goto MOVETOSTATE9;
                                                case 'B':
                                                        goto MOVETOSTATE5;
                                                case 'D':
                                                        goto MOVETOSTATE8;
                                        }
                                        break;
                                case 4:
                                        switch (@char)
                                        {
                                                case 'A':
                                                        goto MOVETOSTATE9;
                                                case 'B':
                                                        goto MOVETOSTATE5;
                                        }
                                        break;
                                case 5:
                                        switch (@char)
                                        {
                                                case 'C':
                                                        goto MOVETOSTATE6;
                                        }
                                        break;
                                case 6:
                                        switch (@char)
                                        {
                                                case 'A':
                                                        goto MOVETOSTATE9;
                                                case 'B':
                                                        goto MOVETOSTATE5;
                                        }
                                        break;
                                case 7:
                                        switch (@char)
                                        {
                                                case 'A':
                                                        goto MOVETOSTATE9;
                                                case 'B':
                                                        goto MOVETOSTATE5;
                                        }
                                        break;
                                case 8:
                                        switch (@char)
                                        {
                                                case 'A':
                                                        goto MOVETOSTATE9;
                                                case 'B':
                                                        goto MOVETOSTATE5;
                                        }
                                        break;
                                case 9:
                                        switch (@char)
                                        {
                                                case 'B':
                                                        goto MOVETOSTATE10;
                                                case 'D':
                                                        goto MOVETOSTATE7;
                                        }
                                        break;
                                case 10:
                                        switch (@char)
                                        {
                                                case 'C':
                                                        goto MOVETOSTATE3;
                                                case 'D':
                                                        goto MOVETOSTATE4;
                                        }
                                        break;
                        }
                        return false;
                MOVETOSTATE1:
                        this.state = 1;
                        goto COMMON_STATEMOVE;
                MOVETOSTATE2:
                        this.state = 2;
                        goto COMMON_STATEMOVE;
                MOVETOSTATE3:
                        this.state = 3;
                        goto COMMON_STATEMOVE;
                MOVETOSTATE4:
                        this.state = 4;
                        goto COMMON_STATEMOVE;
                MOVETOSTATE5:
                        this.state = 5;
                        goto COMMON_STATEMOVE;
                MOVETOSTATE6:
                        this.state = 6;
                        goto COMMON_STATEMOVE;
                MOVETOSTATE7:
                        this.state = 7;
                        goto COMMON_STATEMOVE;
                MOVETOSTATE8:
                        this.state = 8;
                        goto COMMON_STATEMOVE;
                MOVETOSTATE9:
                        this.state = 9;
                        this.exitState = 9;
                        goto NOMINAL_EXIT;
                MOVETOSTATE10:
                        this.state = 10;
                        this.exitState = 10;
                        goto NOMINAL_EXIT;
                COMMON_STATEMOVE:
                        ++this.length;
                        return true;
                NOMINAL_EXIT:
                        this.exitlength = ++this.length;
                        return true;
                }


                public void Reset()
                {
                        this.exitlength = 0;
                        this.length = 0;
                        this.exitState = 0;
                        this.state = 0;
                }
                #endregion // TestLexStateMachine methods
        }
        #endregion // AllenCopeland.Abstraction.Slf.Parsers.Test nested types
}
  /* ----------------------------------------------\
  | This file took 00:00:00.1921559 to generate. |
  | Date generated: 2/21/2009 7:59:48 AM |
  | There were 4 types used by this file |
  | System.Int32, System.Boolean, System.Char, |
  | System.Void |
  |------------------------------------------------|
  | There were 1 assemblies referenced: |
  | mscorlib |
  \---------------------------------------------- */




The only change I made was removing un-necessary breaks in the
switches; I didn't add code-flow analysis to that old code generator
so, it doesn't know when breaks are un-necessary.


I don't have the experience necessary to create a regular expression
that can show the system fault; however, I can say what it is not: It
isn't a regex parser generator, because it doesn't have look-behinds
or look-aheads, and notably by the source above: it doesn't do any
capturing (yet).


I wanted to get the logic right before I worried about 'special
features'. If you notice anything wrong with the above, please let me
know, again I've failed enough times at this to be cautiously
optimistic.


Amusingly the thing I kept screwing up on the previous attempts was
thinking 'one pass'. It wasn't until I realized that I was trying to
get deterministic code out of a non-deterministic setup. So I wrote a
converter which condensed the state sets into a DFA, single-state,
result. Once I realized that, it was much easier. I won't go into
what I did since, if you're reading this: you probably already know.


I think now that I've got it to this point, I can start some
simplistic analysis on the data-sets, based upon the code above, I can
search for cases where the transition (char->state) set are equal to
one another, I can condense the states together into one.


If testing the system that generated the code above is preferable,
please reply, or e-mail me.


Thanks,


-Alexander [Allen C. Copeland Jr.] Morou



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.