Re: Independent Study - Assistance?

Alexander Morou <alexander.morou@gmail.com>
Sat, 28 Feb 2009 11:02:22 -0600

          From comp.compilers

Related articles
Independent Study - Assistance? Alex@alexandermorou.com (Alexander Morou) (2009-02-15)
Re: Independent Study - Assistance? cfc@shell01.TheWorld.com (Chris F Clark) (2009-02-16)
Re: Independent Study - Assistance? Alex@alexandermorou.com (Alexander Morou) (2009-02-21)
Re: Independent Study - Assistance? cfc@shell01.TheWorld.com (Chris F Clark) (2009-02-27)
Re: Independent Study - Assistance? alexander.morou@gmail.com (Alexander Morou) (2009-02-28)
| List of all articles for this month |
From: Alexander Morou <alexander.morou@gmail.com>
Newsgroups: comp.compilers
Date: Sat, 28 Feb 2009 11:02:22 -0600
Organization: Compilers Central
References: 09-02-106 09-02-137
Keywords: parse
Posted-Date: 02 Mar 2009 08:28:17 EST

> The term "regex parser generator" is a mixture of several concepts.


I apologize, perhaps I should have been more clear. I am familiar
with Extended Backus-Naur Form, its format, and usage; however, what
I failed to mention is the reason I used the term 'regex' and 'parser
generator' in the same, is directly tied to the structure of the
project I'm working on.


It's mostly a research project, to teach me what I don't know about
languages, generative programming (as far as programs that build
programs) as far as what I can make of descriptions of various other
concepts as they are described online. I'll admit, that a large
portion of what I'm doing is largely erred by design due to my lack
of a formal education.


I use 'regular languages' and 'regular expressions' and 'regex' for
the project I'm working on because for the most part the syntax and
meaning behind them applies mostly true in the project. It doesn't
use EBNF to describe lexical or structural patterns. The patterns for
rules and tokens are described in unison, the difference between them
lies within their initial operator ':=' (token) vs '::=' (rule). The
project uses a syntax that more closely resembles regular expressions;
additionally I wasn't aware that there was a difference between
'Regex' and 'Regular Expressions'. Most of the descriptions I've been
able to read of them seem to imply they're one and the same, or
perhaps I just assumed.


> I know you are anxious to plow ahead, but your doing this without the
> underlying theory is causing you to make mistakes that could easily be
> avoided. Now, it may be possible that those mistakes may result in
> significant innovations, but the more likely case is that you will
> produce something that looks nice, but is inherently flawed. It would
> only take a couple of courses to rectify that problem, as I noted
> earlier. Until you do, the best I can do is point out your most
> egregious mistakes and leave the subtle flaws un-discussed as you
> don't have the background to understand why they are errors.


I'm expecting to make mistakes, and I'm not pretentious enough to
think that doing it all on my own will yield something no one else can
do, nor something yet undone, but again, if it were as simple as
taking the classes I need, I would be doing so. I'm not in a position
where I have such a luxury, so I must make due with what I have. With
that in mind, while there are subtle flaws in the design, the further
I go the more I recognize that there are aspects that aren't what they
should be. I have a far greater understanding of just how things
interconnect and how complex the problem is, than when I started.


Since this project is, as I put it, a research project, it'll also be
bootstrapping the next version with the same goals, though with the
intent to take it a little further. I aim to structure this version
to build a basic parser, lexer, and CST, which I can further from the
base it gives me. The concepts I want to try are far too complex to
write all of it by hand (namely it'll involve building a general
purpose language inside the DSL to aid in match recognition so things
I can't do now are easier, such as when you're parsing a number, you
build the numeric version of the number as it parses, something
impossible with the recognizer method I'm using now).


Just since the project has started, the way I wrote the parser, lexer
and similar aspects would be different were I to have started now.
There are far too many things left to chance in the hand-written
parser I have now, certain parse errors don't report properly, cause
unexpected hangs, or stops, and lead to debugging issues that aren't
very easy to find, due to the hackish method the entire parser aspect
is written in. But that's neither here nor there; the parser does
what it needs to do, for now, given a properly formatted file, it will
describe ambiguities or the errors it is capable of finding. Most
of the work I'm doing now is with the data it's giving me. Once I
have the lexer generator working the way I want, erred it might be,
I'll begin writing the next phase (the syntactical version of the
same that was done with the lexer). I'm not expecting perfection
from this project, but I do expect to get a better understanding of
the problem as a whole. So when I write it the next time, it'll be
a step closer to the proper way to do it, and I can start perhaps
going in other areas of research beyond the basics of lexical and
syntactical parsing theory.


> I'm sorry,


Please, don't be.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.