Re: Good practical language and OS agnostic text?

BGB <>
Thu, 19 Apr 2012 00:05:34 -0700

          From comp.compilers

Related articles
[4 earlier articles]
Re: Good practical language and OS agnostic text? (Derek M. Jones) (2012-04-18)
Re: Good practical language and OS agnostic text? (2012-04-18)
Re: Good practical language and OS agnostic text? (glen herrmannsfeldt) (2012-04-18)
Re: Good practical language and OS agnostic text? (glen herrmannsfeldt) (2012-04-18)
Re: Good practical language and OS agnostic text? (Roberto Waltman) (2012-04-18)
Re: Good practical language and OS agnostic text? (Bakul Shah) (2012-04-18)
Re: Good practical language and OS agnostic text? (BGB) (2012-04-19)
Re: Good practical language and OS agnostic text? (Hans-Peter Diettrich) (2012-04-19)
Re: Good practical language and OS agnostic text? (Hans-Peter Diettrich) (2012-04-19)
Re: Good practical language and OS agnostic text? (2012-04-19)
Re: Good practical language and OS agnostic text? (2012-04-19)
Re: Good practical language and OS agnostic text? (2012-04-19)
Re: Good practical language and OS agnostic text? (2012-04-19)
[33 later articles]
| List of all articles for this month |

From: BGB <>
Newsgroups: comp.compilers
Date: Thu, 19 Apr 2012 00:05:34 -0700
References: 12-04-019 12-04-024 12-04-028
Keywords: books
Posted-Date: 19 Apr 2012 23:12:29 EDT

On 4/18/2012 3:43 PM, glen herrmannsfeldt wrote:
> Derek M. Jones<> wrote:
>> On 17/04/2012 22:28, wrote:
>>> Guys, I'm having a bear of a time finding a good practical language
>>> and OS agnostic text on writing a compiler. I'm weak in math and not
>>> interested in the theoretical details. I want to understand the hows
>>> and whys of compiler writing.
>> I always recommend:
>> A Retargetable C Compiler: Design and Implementation
>> by David R. Hanson and Christopher W. Fraser
> So, that makes two (out of about five) of us. (My post comes later.)

sadly, in my case, I can't really currently get (or afford) books which
aren't freely available online.

did find a site though (which I guess is a personal site for the author?):

which has maybe a few interesting looking papers.

saw one about employing LZ77 in an ISA, which is a fairly nifty seeming
feature, but is sadly not really directly applicable in my case.

> Much of the book is about code generation, which, it seems to me,
> is not described in as much detail in many other compiler books.
> Parsing theory is where much of the theory, and hard to understand
> mathematical descriptions, appear, but in the end (back end, in the
> case of compilers) it is about code generations.

then there is the irony being that it is considerably more effort IME to
write the back-end than it is to write a parser, or at least this has
been my experience in these matters (logic for things like register
allocation, allocating space in the stack-frame, logic for dealing with
types and spitting out various ASM fragments, ... is all a bit painful).

granted, I tend not to use any fancy parsing algorithms, as pretty much
all of my parsers have been hand-written recursive descent parsers.

although, I guess it is always possible that maybe I have just been
approaching the back-end in ways more complex or painful than necessary,
but it is not clear how it could all be considerably easier.

granted, maybe it is also that modern CPU architectures (such as x86 and
x86-64) and type-towers (integer types, FPU, and SIMD) are more complex
than many older architectures.

if so, maybe the parser preoccupation is at least partly a hold-over
from the times where the parser was more complex relative to the

I guess the other major option is that it is because the parser is often
the part of the problem that most people run into first (and thus where
more people are more likely to give up and walk away?).

> As far as languages to write compilers in, it is now usual (though
> maybe not 50 years ago) to describe parts of the compiler in a
> special purpose language. As previously noted, there are flex and
> bison to write the front end, though you usually need to know some
> C to use them.

I have not personally used them.

to me, depending on an external tool to spit out code for something that
is maybe a few thousand lines seems a bit overkill IMHO.

granted, maybe it is that I haven't been writing "maximally efficient"
parsers, but more just sort of "straightforward but naive" parsers:
read in a few tokens;
match against possible constructions;

I had not usually actually introduced a separate lexer and parser stage
either, but instead typically just read tokens directly from a string
buffer during parsing (or a "reader stream object").

I have sometimes used a small hash-table to partly optimize re-reading
the same tokens (as my parsers tend to often end up re-reading the same
tokens multiple times, and hashing the string-pointer allows skipping
much of the logic). this was done in my C parser as it ends up dealing
with considerably more code during parsing (and thus tends to bog down
more readily with reading tokens).

a lot of this then ends up with a lot of if/else logic and "strcmp()"
and similar (although in a few places I have used "indexed tokens",
where an index number is used for keywords in place of string comparisons).

maybe it is all a bit lazy, but it works.

> For compilers that generate code for more than one target,
> (at least gcc and lcc), the back end is usually described through
> a language easier for humans to understand. To me, the lcc code
> generator is much easier to understand than that of gcc.
> You should be able to write a description for a new target
> without knowing C, or much of parsing theory. You do need a
> good understanding of the instruction set for the target, though.

in my case, most of my stuff tends to be plain C.

some of the code in my prior code-generator did use a special
preprocessor which allowed for more powerful macros.

my assembler uses a special notation for the ASM listings, where
basically a notation similar to that used in the Intel docs is used to
describe most of the opcodes.

later changes to support AVX and XOP, and ARM and Thumb, were a little
less clean (nasty notational cruft).

more notation had to be devised as these docs had (for some reason)
mostly using either bit-based notations or "bit-box" diagrams.

actually, when I started out trying to write the assembler, I started
out writing explicit logic for the various instructions, but quickly
became frustrated, and then wrote a tool to convert the Intel-docs
notation into C (changing all this mostly into a task of transcribing
the docs into a text file).

but, pretty much everything else is plain C.

I am not terribly sure why "needing to knowing C" would be something to
be avoided for a compiler backend, when presumably someone who "doesn't
know C" (and can't be asked to learn about it) should presumably be
somewhere far away from a compiler's back-end?

a better goal I think is using a specialized format more as an aide to
reduce the amount of code which needs to be written to support a new target.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.