Semantic difference of source files

Michiel <m.helvensteijn@gmail.com>
Sun, 17 Aug 2008 20:56:25 +0200

          From comp.compilers

Related articles
Semantic difference of source files m.helvensteijn@gmail.com (Michiel) (2008-08-17)
Re: Semantic difference of source files m.helvensteijn@gmail.com (Michiel) (2008-08-17)
Re: Semantic difference of source files barry.j.kelly@gmail.com (Barry Kelly) (2008-08-18)
Re: Semantic difference of source files DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-08-18)
Re: Semantic difference of source files gah@ugcs.caltech.edu (glen herrmannsfeldt) (2008-08-18)
Re: Semantic difference of source files gah@ugcs.caltech.edu (glen herrmannsfeldt) (2008-08-19)
Re: Semantic difference of source files marcov@stack.nl (Marco van de Voort) (2008-08-20)
[5 later articles]
| List of all articles for this month |

From: Michiel <m.helvensteijn@gmail.com>
Newsgroups: comp.compilers
Date: Sun, 17 Aug 2008 20:56:25 +0200
Organization: Wanadoo
Keywords: parse, practice, comment
Posted-Date: 17 Aug 2008 16:32:58 EDT

Hi!


I have this idea, but I'm not sure if it's already being used anywhere. And
if not, I would like to know why.


My idea is to leave behind a record of the Abstract Syntax Tree after
compilation. This may simply be a copy of the original source code, an XML
file or even a binary representation (for better performance, see below).


You see, all automatic building utilities I know (make, ant, jam) use only
the modification date of a source file to decide whether or not to
recompile it and its reverse dependencies. For this reason, many
programming languages encourage the programmer to separate their interfaces
and implementations either textually or by using different files.


Programmers will even use the PIMPL idiom, just to get those private
data-members out of that header file. The advantage is that you may change
the internal representation of a class without having to recompile the
modules that use the header file. But to accomplish this you sacrifice
runtime-speed (the extra redirections) and code clarity.


I propose that in addition to comparing modification dates, we selectively
compare AST's (if the source file is newer). At the very least this allows
the programmer to modify style, layout and comments without requiring any
recompilation at all. If the comparison utility is smart enough, you may
even perform trivial refactoring without losing semantic equivalence.


The big bonus is that you may also, quite easily, separately compare the
interface and the implementation. If only the implementation has changed,
there should be no reason to recompile those modules that only use the
interface. Also, multiple classes could be put into the same source file if
it seems conceptually right, and you'd get the same advantage of separate
comparison.


You can take this even further and let the computer automatically find the
optimal separation of object files, which need not have any relation to the
separation of source files.


Now, this means:


* You need to save a representation of the AST when compiling. This is, I
believe, a low price to pay. Perhaps it could even be attached to the
result file as a sort of meta-data.


* You need to process a changed source-file up to the AST level before you
know the semantic difference. But this is a relatively cheap operation
compared to full compilation and optimization. And it could be avoided for
the old file if you save the AST in binary form.


This seems, to me, such a simple trick that I am surprised I have not
encountered it before. But that may simply be my inexperience.


What do you think?


--
Michiel Helvensteijn
[If you're only going to compare the ASTs to see if they're the same,
all you need to save is a hash. The question of why nobody seems to
care about compilation speed any more. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.