Empirical data: assembly source vs. HLL source.

Mark William Hopkins <hunk@csd.uwm.edu>
6 Jul 1999 12:00:12 -0400

          From comp.compilers

Related articles
Empirical data: assembly source vs. HLL source. hunk@csd.uwm.edu (Mark William Hopkins) (1999-07-06)
Re: Empirical data: assembly source vs. HLL source. jsture@nortelnetworks.com (John Sture) (1999-07-10)
Re: Empirical data: assembly source vs. HLL source. tej@melbpc.org.au (Tim Josling) (1999-07-10)
Re: Empirical data: assembly source vs. HLL source. chase@world.std.com (David Chase) (1999-07-11)
Re: Empirical data: assembly source vs. HLL source. djb@koobera.math.uic.edu (1999-07-12)
Re: Empirical data: assembly source vs. HLL source. johnmce@world.std.com (1999-07-12)
Re: Empirical data: assembly source vs. HLL source. t.hutt@worldnet.att.net (Taylor Hutt) (1999-07-12)
[4 later articles]
| List of all articles for this month |

From: Mark William Hopkins <hunk@csd.uwm.edu>
Newsgroups: comp.compilers
Date: 6 Jul 1999 12:00:12 -0400
Organization: University of Wisconsin - Milwaukee, Computing Services Division
Keywords: summary, assembler


I've provided some empirical data below based on some recent projects
which I hope will decisively settle the issue of Assembly vs. C,
putting it in proper perspective.

An important lesson to be learned from experiences codified in the
data below is that the relative (in)efficiencies of coding in either
environment have little or nothing to do with the language used! It's
all about style -- and to a large degree the quality of tools
(particularly, assemblers) used with the project.

If nothing more comes out of this contribution, it will be to
underscore the point that the source of the presumed disparity often
cited between Assembly vs. C has very little to do with language at
all, but with programming style.

In other words: it's the programmer's fault (for biasing the data).

In detail, in the data provided below the complexity of source code
written in assembly vs. C is *nearly equal*, when both are written in
the correct style (i.e., that which you were taught in programming
classes, with indenting lines, using variables efficiently,
modularity, avoiding flags, etc.)

That complexity is with respect to 3 measures: lines of source, words
of source and bytes of source. Both sets are nearly equal with
respect to all 3 measures.

In the first citation, a fair comparison shows the assembly source
would be smaller!

Thus, the empirical data provided supports the contention I've long held
                                                    Language Is Irrelevant.

These days, I don't even think of assembly and HLLs as separate
languages anymore, but merely dialects of the same language. The
conversion back and forth between the two sets of sources cited below
are so quick and efficient (about a few hours) each, and the assembly
source upon casual inspection looks so much like HLL, that it's
difficult to distinguish between the two.

In that context, the other important lesson established here becomes of
paramount importance:

Style Transcends Language.



For this comparison, assembly source was written in the 8051 language
using my CAS 8051 assembler (which is listed in the comp.compilers
Free Compilers section of the FAQ).

Counts are reported as "lines", "words" and "bytes" using a PC port of
the UNIX WC command.

This assembler contains several amenities friendly to high-level language
                            -- Multiple statements per line
                            -- C-like statement syntax for directives
                            -- C-like numeric syntax (coexisting with the Intel syntax)
                            -- C-like comment syntax (partly coexisting with Intel syntax)
                            -- C-like expression syntax

The source for C-version is written mostly in ANSI-C (with a small core
in assembly for handling task-switching), using the Borland C compiler
and TASM assembler.

The comparison is of source code, since this is what is most pertinent
to programmers and to development time.

An important point to note: the programming style used for all the
source is consistently the same and to a small extent this is only
possible with the amenities provided by my assembler.

The lesson to be learned from this is that what is usually construed
as "assembly language" has nothing to do with assembly language at
all! It has to do with programming STYLE -- and to a lesser degree:
the quality of tools used when programming in assembler.

It's assembly language STYLE that's the usual culprit that people
mistakenly point to when purporting that assembly language is either
"difficult" or "inefficient" (at the source level).

The reason for this is that, somehow, programmers seem to feel both
the necessity and licence for reverting to early 1960's late 1950's
style programming (and thinking) the very instant they get their hands
dirty with the mnemonics.

For some reason it never occurred to them to write their code using
the same stylistic conventions (and, more importantly, THOUGHT
PATTERNS) that they do in high-level languages.

In fact, apart from assembly source which I've written, I've never
actually seen anyone render their assembly source in a high-level
style (with underlying high-level thinking)!

That's how serious the confusion is between language, style and thought.

As this data establishes, the contentions usually offered in the
typical Assembly vs. C debate are completely groundless (with regard
to language) but extremely pertinent (with regard to style).

Hence the comment:
                                                    Language is Irrelevant.



An LED display for showing static messages. The term "static" is
somewhat misleading since "messages" can also include animations,
flashing, etc. Static means the pattern is unchanging (even between
power cycles) until a new directive is issued from the host.

The unit has some of the amenities of a graphics accelerator and video
adapter, such as a set of programmable character tables, per-pixel
programmability issued by a orthogonal set of graphics commands
(actually the same command does pixels, lines and rectangles,
depending on the parameters); a 256-entry color palette (but only 4
colors to choose from), a 256-entry "color table" (corresponding to
the 256 combinations of 4-cycle periodic color sequences constructible
from 4 colors).

Source Language Port Use
8051 assembly 96 x 32 display unit In evaluation at the Mayo Clinic.
ANSI-C 672 x 160 unit Experimental in-house.

The 672x160 display runs off a custom-built PC-compatible ISA card,
controlled from a PC. Only the upper left 480x56 area is used, with
one pixel being mapped to an 8x8 grid.

                            Lines Words Bytes
C source: 2006 11605 98490
8051 source: 2024 12916 99700

Note: 77 lines, 243 words, 1718 bytes of the "C" source is in x86 assembly.

2nd Note: the C source does not include library source (however, not
much is used). The 8051 source, however, reuses a fair amount of code
(from prior projects) which essentially qualifies as the equivalent of
C-library source in terms of comparison.

Counting C library source (or excluding the reused 8051 source) would
actually make the C source larger!

Also: a few features are present in the 8051 version that were left out
of the C source because of hardware considerations, further weighing
the disparity in favor of the assembly source!



An LED display primarily intended for the gaming industry, whose main
function is to show a multicolor odometer-style jackpot. Also has
ability to display scrolling messaging & bit-mapped data, with
multiple source pages.

Shares major design features with (A): programmable character table,
256-entry color palette, but 16-entry "DAC" color table, built on top
of 16 basic colors.

Source Language Port Use
8051 assembly Large 60x7 unit Several gaming companies.
                                    Small 60x7 unit Ditto.
                                    Small 60x7 unit, fast 8051 Experimental in-house.
ANSI-C 672 x 160 unit Experimental in-house.
(FORTH?,Small-C?) Large 60x7 unit (Competitor unit)

                                  Lines Words Bytes Port
C source: 1474 6900 45136 672x160 unit
8051 sources: 1781 9780 55152 Large 60x7
                                  1759 9689 54626 Small 60x7
                                  1763 9703 54740 Small 60x7, experimental

Note: 77 lines, 243 words, 1718 bytes of the "C" source is in x86 assembly.
            1675 lines, 9277 words, 52041 bytes is shared by all three 8051 ports.

2nd Note: once again, this does not include any source code for the C

                                  Bytes Unit
FORTH/C? binary: 11077 (Competitor unit)
8051 binary: 5517 Small 60x7, experimental

Note: the size of the 1st binary is largely a measure of the
inefficiency of the compiler used which was either SMALL-C (which is
entirely stack-based) or something related to FORTH. A better
compiler probably would have gotten it down to around 7000-8000 bytes.

2nd note: the hardware on the competitor unit is actually more
efficient (and more expensive), so less code required for it. This
increases the effective ratio even higher over 2:1.

3rd Note: the binary in the competitor unit is held partly in a
locked-out code segment (the first 8096 bytes), with the excess going
into an external memory chip.

However, simple logic indicates that all of locked-out area is being
used, as there would be no other reason to risk compromising the
security of the locked-out code putting the excess in an accessible
area (a risk that cannot be underemphasized!), unless it were
ABSOLUTELY necessary to do so.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.