From: | "Pascal J. Bourguignon" <pjb@informatimago.com> |
Newsgroups: | comp.programming,comp.compilers,comp.editors |
Date: | Tue, 12 Apr 2011 06:00:09 +0200 |
Organization: | Informatimago |
References: | 11-04-009 11-04-011 11-04-024 |
Keywords: | editor, code |
Posted-Date: | 15 Apr 2011 15:50:52 EDT |
HiramEgl <hiramegl@hotmail.com> writes:
> I'm specially concerned about how the source-structure is stored. I
> don't want it to be stored in a file because is not flexible. I would
> like to have the source-structure stored in some binary format
> independent of user-languages or programming-languages, ideally.
It might be liberating to think only about objects (as in OO). We may
still have to use file to implement persistency of these objects (as
mentionned in another answer, file systems are a kind of database that
has its advantages too), but you can ignore it for now, since it should
be considered an implementation detail. (Objects could as well be stored
in a OO database, or mapped to a relationnal database. Another quite
interesting option, but unfortunately totally theoric for now, would be
to have a persistent system such as EROS http://www.eros-os.org/ ; but
this is what you have in a first approximation when you have an image
based environment such as lisp or Smalltalk, the only difference with
EROS, is that in Lisp or Smalltalk, you have to remember saving the
image yourself, while in EROS it's done automatically).
> With the help of some translation tables it would be possible to
> regenerate source code in a specific user-language or programming-language.
\>
> For example, if the user types:
> int Result;
>
> then the editor would have the capability of understanding that the
> first word is a variable type and the second word is a variable
> identifier. It would create a binary representation:
>
> 50 9899
>
> and update a translation table:
>
> Id English
> -------------
> 50 int
> 9899 Result
>
> Afterwards, another user might update the translation table for other
> language:
>
> Id English Espaqol
> -------------------------
> 50 int entero
> 9899 Result Resultado
>
> And regenerate the source code in another language:
>
> entero Resultado;
There exist already a binary format to serialize and deserialize
syntactic trees.
Each node of the syntactic tree is serialized as follow (all bytes given
in decimal):
node ::= 40 <node-label> { 20 <child> } 41 .
node-label ::= <identifier>
identifier is a sequence of non-space characters encoded in ASCII (most
special characters are also allowed, apart a few exception).
child ::= <integer> | <floating-point> | <string> | <identifier> | <node> .
integer is a sequence of digits encoded in ASCII possibly preceeded by a
sign.
floating-point is likewise.
string ::= 34 <characters-but-double-quote> 34
So for example:
40 73 70 32 40 61 32 65 32 50 41 32 40 80 82 73 78 84 32 34 101 113
117 97 108 34 41 32 40 68 69 67 70 32 65 41 41
represents the syntactic tree:
if
/ | \
/ | \
= print decf
/ \ | |
a 2 "equal" a
Of course, when you translate this tree to text, you can generate:
SI a=2 ALORS AFFICHE "equal" SINON DECREMENTE a
or:
if a=2 then begin write("equal"); end else dec(a);
or:
if(a==2){printf("equal")}else{a--}
depending on the preferences of the programmer.
(An unsophisticated programmer (what is usually and depreciatedly called
a "real programmer") would edit the binary directly, which would give,
with an ASCII editor:
(if (= a 2) (print "equal") (decf a))
[perhaps with the help of paredit thought ;-)])
> All this comes from the frustration of having the structure of
> algorithms or designs trapped in a specific user-language or
> programming-language. Because, I think that a lot of knowledge is
> trapped in source code written in english. I would like to have a tool
> that would help me to reuse very easily the structure of algorithms
> and designs written in other applications.
But apart from some popular languages who aren't really different at the
semantic level (eg Pascal and C share mostly the same semantics), in
most cases the semantic differences are the real problem.
Let's take a simple example:
(defun fact (x)
(if (= 1 x)
1
(* x (fact (- x 1)))))
int fact(int x){
return (1==x)
? 1
: x*fact(x-1);}
fact(17) in C will return -288522240, while
(fact 17) in lisp will return 355687428096000.
And we're not even speaking of object systems, type systems, exception
handling, etc.
If the purpose is to reuse algorithms written in specific languages, it
means 90% of the time, C or C++, and unfortunately since those are very
low level programming languages, the implementation of the algorithms in
these languages is mirred by implementation details entirely irrelevant
to the algorithms. Filtering out those details is not easy.
In practice, you can reuse algorithm written in language X from language
Y, by using what is called a FFI, a Foreign Function Interface.
When the languages X and Y share some "common language", such as the way
the manage memory, the representation of values of various types, etc,
it's easy enough. We can call C functions from a Pascal program, and
vice-versa. We may also call Fortran functions from C programs, but
even this is already more delicate. We can also do something even more
complex, crossing borders of quite different environments, such as
controlled environments (Smalltalk, Python, Lisp, etc) and uncontrolled
environments such as C, but the cost becomes higher, since each FFI
function call involves some value conversion.
And to revert to the example above, using a library such as libecl
(http://ecls.sourceforge.net), you could call from C the function fact
written and compiled in Lisp, and get the correct result 355687428096000
in your C program. But since this is not a value that can be handled by
any native C type, you cannot use it easily, you cannot pass it to
printf, basically, you cannot do anything with it in C, you have to
call library functions in libecl to deal with it. So even for a simple
arithmetic algorithms, the languages are so different that you cannot
really cross the border, you may just play puppets across the FFI.
And if you want more fun, try to write a subclass of a C++ class in
Smalltalk or vice-versa.
> For example, I would like to drag-and-drop a quicksort algorithm into
> an application that later I could regenerate into "c" source-code in
> Spanish or ruby source-code in Swedish.
>
> I'm interested in transporting algorithms, designs, architectures across
> user-languages, programming-languages, platforms, etc.
IMO, the only way to do it is to hire programmers, or to implement
Strong AI, and once you have Strong AI, you don't need to translate
programs anymore (the AI does it itself).
--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.