Re: Genetic programming and code evolution

"Dmitry A. Kazakov" <>
24 Dec 2005 15:35:53 -0500

          From comp.compilers

Related articles
Genetic programming and code evolution (=?ISO-8859-1?Q?Ga=EBl_Rosset?=) (2005-12-23)
Re: Genetic programming and code evolution (Dmitry A. Kazakov) (2005-12-24)
Re: Genetic programming and code evolution (2005-12-24)
| List of all articles for this month |

From: "Dmitry A. Kazakov" <>
Newsgroups: comp.compilers
Date: 24 Dec 2005 15:35:53 -0500
Organization: cbb software GmbH
References: 05-12-065
Keywords: optimize
Posted-Date: 24 Dec 2005 15:35:53 EST

On 23 Dec 2005 20:54:18 -0500, GaŽl Rosset wrote:

> I would like to create a genetic algorithm to modify C/C++ and improve
> it for either speed or space or a combination of both.
> This would be quite similar to the tool called Critticall, see

I am skeptical about the page, if not to say more.

From the machine learning perspective the way a genetic algorithm
works is selection. If you want to optimize for some criterion (like
speed) you need to measure it to be able to select the most promising
mutations. Measuring program speed is a way too expensive for the
incredible number of mutations you would need.

The second problem is the genome language. You don't want to randomly
modify hexadecimal code, because the number of outcomes is too
big. You also don't want to randomly change the program source text,
because most of mutations will not compile. So you believe that
modification of an intermediate code would do better? It won't. Random
mutations will not preserve program correctness and checking for
correctness is absolutely unrealistic. So you need to limit all
mutations to ones that guaranteed produce a semantically *equivalent*

Now you have another problem, by applying such mutations one by one
you will never be able to leave a local optimum. The tree distance
between the current program and a somewhat better one is just too
big. You need a large number of simultaneous mutations to reach it. It
is statistically impossible. Furthermore, considering a program as a
whole is hopeless. Programs are decomposed into smaller blocks, that
actually lets their optimization while preserving their local
semantics. But the decomposition itself is also a sufficient subject
of optimization, and you have no chance to attack it using trees.

This is not the way humans are optimizing programs. They make changes,
which being translated into trees, are scattered all across the tree.
People operate program semantics, which is not well captured by the

The bottom line is: you need a better genome language than syntactic
trees, you need better selection criteria. Probably software patterns
is the most close thing which comes in mind.

> q1) Do you know of any educational/free/open compiler which can compile
> to a intermediate code which can easily be represented in a tree
> structure and be modified by a genetic algorithm ?

I have a parser which does semantic call backs (they can produce a
tree or do anything else):

I think you should definitely consider pre- and postconditions rather
than raw code. You might also be interested in fuzzy methods, because
it is close to the way humans take their decisions. As I said, direct
measures are unrealistic, so you need some soft heuristic criteria
instead. Fuzzy might be also a way to postpone a decision (which
actually means that you keep both (or more) alternatives alive.)

Good luck!

Dmitry A. Kazakov

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.