Re: The remarkable similarities between Flex/Lex and XSLT

"matt.ti...@gmail.com" <matt.timmermans@gmail.com>
Sat, 25 Jun 2022 09:20:54 -0700 (PDT)

          From comp.compilers

Related articles
The remarkable similarities between Flex/Lex and XSLT costello@mitre.org (Roger L Costello) (2022-06-24)
Re: The remarkable similarities between Flex/Lex and XSLT gah4@u.washington.edu (gah4) (2022-06-24)
Re: The remarkable similarities between Flex/Lex and XSLT matt.timmermans@gmail.com (matt.ti...@gmail.com) (2022-06-25)
| List of all articles for this month |

From: "matt.ti...@gmail.com" <matt.timmermans@gmail.com>
Newsgroups: comp.compilers
Date: Sat, 25 Jun 2022 09:20:54 -0700 (PDT)
Organization: Compilers Central
References: 22-06-073
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="79332"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, history
Posted-Date: 25 Jun 2022 12:44:44 EDT
In-Reply-To: 22-06-073

On Friday, 24 June 2022 at 09:00:44 UTC-4, Roger L Costello wrote:
> Hi Folks,
>
> XSLT is a language for processing XML documents.
>
> There are remarkable similarities between Flex/Lex and XSLT. Lex was created
> 47 years ago, long before XSLT. One wonders if some members of the XSLT 1.0
> Working Group were Lex users and were influenced by its concepts?


It's not really about a single tool like Lex.


Before XML there was SGML, which XML was supposed to "simplify". SGML
included a schema language (DTD), which defines the hierarchical structure of a
document using regular expressions over elements. There was also a strange
unnecessary constraint on these expressions called "ambiguity", which
*everybody* who wrote SGML software needed to understand, and so the idea of
applying formal language techniques to SGML was inevitable.


Long before XSLT, there were a variety of attempts to define languages that
would allow users to specify an automatic translation from SGML into printed
form. Many of these languages were context-free grammars at their core, with
translation rules as actions. This is called "syntax-directed translation"
and was a well-known concept long before that.


With SGML, though, the problem of syntax-directed translation is different
than it is in other contexts, and more difficult in many ways, because the
basic structures in the input are very easy to parse -- elements are delimited
after all -- but the input was a semantically marked up text and the output
was a published document that had to follow all the ambiguously-defined
stylistic rules that people use when they actually to typography. This meant
that complicated grammars, over *element trees* instead of linear text, and
lots of other ideas, needed to be applied. Lots of companies put a lot of
work into it.


So by the time XSLT came around, everyone on the committee as already familiar
with a lot of this history from SGML processing, which was based on a lot of
work rooted in the same formal language theory that goes into lexers and
parsers, and that is why some of XSLT looks a lot like Lex.


Unfortunately, XSLT kind of sucks. When the standard was written, the problem
itself had not really been solved by industry in a really acceptable way (and
it still hasn't been!), and the W3C committee fell into the trap of trying to
innovate instead of codifying best practice.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.