Re: Simple Language design

farmersckn@hotmail.com (farmersckn)
17 Apr 2002 23:19:22 -0400

          From comp.compilers

Related articles
Simple Language design chubbles@blueyonder.co.uk (2002-04-06)
Re: Simple Language design k.prasad@attbi.com (Kamal R. Prasad) (2002-04-07)
Re: Simple Language design farmersckn@hotmail.com (2002-04-17)
| List of all articles for this month |

From: farmersckn@hotmail.com (farmersckn)
Newsgroups: comp.compilers
Date: 17 Apr 2002 23:19:22 -0400
Organization: http://groups.google.com/
References: 02-04-023
Keywords: lex
Posted-Date: 17 Apr 2002 23:19:22 EDT

Lexical analysis is simply taking the raw character input and
seperating that input into tokens (i.e. keywords, numbers,
identifiers, and symbols (like ==))


Create a few functions to say if a character is a letter, a digit, or
a symbol. You might create a 256 entry array that says if the ascii
code at that index is a number, letter, or symbol. Then create a loop
that takes a character, ids it as a letter, digit or symbol, and then
"builds" a token based on what kind of character it is:


char c
string s
while not eof (inputstream)
  c = getnextchar (inputstream)
  if isletter(c) then
    s = getname (inputstream)
    if iskeyword(s) then
      addtokentolist(s, KEYWORD)
    else
      addtokentolist(s, IDENTIFIER)
    end if
  else if isnumber(c) then
    s = getnumber (inputstream)
    addtokentolist(s, NUMBER)
  else if issymbol (c) then
    s = getsymbol (inputstream)
    addtokentolist(s, SYMBOL)
  else
    error!
  end if
end while


I would HIGHLY recommend that you read "Let's build a compiler!" by
Jack Crenshaw. Its online and free, and packed with useful
information.




Hopefully that gives you some direction.




chubbles@blueyonder.co.uk (Chubby) wrote
> I'm currently developing a very simple language which describes HTML
> forms in simple text. I'm using JAVA to implement the
> compiler/translator and just need to know the general STEPS needed for
> lexical analysis. I've read tons of books which describe how simple
> mathmatical expressions can be tokenized but what about the more
> complicated strings, keywords etc?
>
> Also what would be the best way to read the source file into the
> program. ...


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.