New Language HybridJava and a Compiler from it.

hj <>
Wed, 3 Jun 2009 19:15:32 -0700

          From comp.compilers

Related articles
New Language HybridJava and a Compiler from it. (hj) (2009-06-03)
| List of all articles for this month |

From: hj <>
Newsgroups: comp.compilers
Date: Wed, 3 Jun 2009 19:15:32 -0700
Organization: Compilers Central
Keywords: design, Java
Posted-Date: 04 Jun 2009 15:46:02 EDT

    HybridJava Compiler

Java Server Pages (JSP) made it possible to apply a combined power of
a general programming language and content mark-up language (that of
Java and HTML in this case) in one source file. However the JSP as a
concept did not go beyond being a "template engine" that mechanically
mixes pieces of Java and HTML code with straightforward switches from
Java to HTML and back (`%>' and `<%') with no deeper syntax
control. Attempts to add code reuse to JSP so far did no bring a
solution flexible enough.


HybridJava (HJ) is a mix (in one grammar) of HTML and Java subsets and
in this sense is a successor of original JSP. The language consists of
three parts - subset of Java, subset of HTML and all the rest. The
latter serves for code factorization and reuse as well as for `gluing'
Java and HTML.

The HJ code of an application is provided as a set of .page files (one
per Web page) and a set of .widget files. Widgets are the units of
code reuse. Widget definitions and usage follow the XML-like
syntax. A widget may define named attributes and/or named slots (or
one anonymous slot). The rest of the .widget file is just ARBITRARY
HJ code. A slot is somewhat similar to the position between opening
and closing `library tag'. The code in HJ slots gets Java context of
the point of call of the widget, thus widgets are transparent for Java
context (same as HTML elements are by the way).

Note the difference between the role (semantics) of HTML tags and that
of widget-related tags in HJ source. The latter marks up the HJ source
regarding widget definitions and widget usage. The former has a
meaning of self-output (unlike what it means for a browser).

To facilitate analysis both HTML and Java subsets are made stricter
than canonical HTML and Java. In particular the `<' already overloaded
in Java gets additional overload from HTML. So in HJ a blank in an
opening tag is not permitted (< html>) and on the other side the `5 <
a' in Java expression MUST have at least one blank after `<'. However
`5<x' may be OK as long as HTML does not define tag `x' AND you did
not define a widget (and/or in some contexts - slot) named
`x'. Luckily the described is the only significant difficulty of such
kind met. Another restriction is that elements that in canonical HTML
may omit a closing tag in HJ MUST have it.

The major `trick' that makes it possible to minimize the `glue'
between Java and HTML lies in attributing the Java code within HJ code
with a property known from HTML as PCDATA. Specifically it means that
HTML elements as well as widget calls and slot calls freely float
between Java operators. However that is not sufficient by itself. Java
code may be executed on the server side as well as be shown in the
browser (as part of an HTML document), so a hint may be provided in
the HTML or widget-related tags about the nature of the tagged code
(Java or HTML). HTML elements that do not permit content between tags
besides further mark-up do not need such a hint (15 tags including


Both page and widgets MAY be associated (by a simple naming
convention) with a `state' class containing data members and/or
handler methods (@Override). This plain Java class is exactly what is
called `code behind' but with less headache.

While processing an identifier in the .widget file the HJ compiler
first tries to find its definition in the widget Hybrid (Java)
code. Then - among widget attributes. After that it looks into
exports of the slot involved. Next, it examines the members of widget
state instance. Otherwise the identifier is left to be handled by
javac. Resolving identifier from .page is similar except that it has
fewer steps.


The HybridJava Compiler generates Java code (same as JSP engine often
does) one class per page. Code of widgets gets included into that
class too. Compiler resolves widget cross-reference (including
recursive) by open substitution. The generated code gets included into
context that implements a simple framework API with page and widget
handlers (wrapped around the Servlet API). API cares about dispatching
every HTTP request `event' to a proper handler.


The implementation uses JavaCC augmented with handwritten analysis
code. Token lists are built only once but processed more than once by
different parts of the generated parser or handwritten parsers. For
better error diagnostics an additional bracket checker runs as a co-
program with the generated parsers. Java grammar for JavaCC became a
property of SUN once the JavaCC authors joined that company. The
grammar used here inherits from a version previous to Java 5.0. Also
re-engineering of type- dependant Java analysis proved to be a bit
beyond resources available. Altogether it made the current
implementation of HJ Compiler a bit loose regarding Java part of the
code, but javac runs at the end of the process anyway.

In general the compiler implements a concept of `human- oriented
compilation' when it tries to follow a model of how a reading human
being understands the code. To a reasonable extend it means the more
passes the better.

HybridJava Compiler is free for non-commercial use.
For more details see:

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.