[ts-gen] attribute grammar vs. flex/bison [Was: FX bid/ask quotes truncated]

Bill Pippin pippin at owlriver.net
Mon Aug 3 13:01:14 EDT 2009


You ask:

> ...  I was also a little curious why you didn't use flex/bison ...

Granted, a grammar specification for the shim's command language
would be useful to users.  This is a good example of missing
documentation that ideally should be provided with the shim.

Note that the IB tws api is very much outside our control, and that
their language 'specification' is in the form of java code, via the
files EClientSocket.java and EReader.java, so that any grammar we
wrote here would be neither primary nor authoritative.

Note also that the api sentence level grammar would be voluminous and
have many versions; and finally, that it would be trivially ll(0)
for simple messages, since the choice of sentential form is determined
by the message index and version, yet context sensitive for compound
messages --- history, scanner, and combo (bag) contracts --- where
the repeating group count determines the message length.

You've actually raised at least three questions here, corresponding
to lexical, grammatical, and semantic analysis:

    1.  Why not use flex for the scanner?
    2.  Why not use a bottom-up parser such as bison for the parser?
    3.  Why not acccept the code-fragments-for-semantic-actions
        approach pioneered by yacc and copied by bison in place of
        some attribute grammar formalism?

In brief, we have easy token level and sentence level languages, with
comparatively harder semantic action and object construction tasks,
so that traditional tools like flex and bison provide little advantage.

At the same time, there are multiple input streams, languages, and
start symbols per language, and a very hard error handling problem
embedded in the IB tws api, since there is no truly safe way to
synchronize the input stream once an error has occurred.  For each
of these issues, traditional tools make the implementation more
difficult to write and less robust to maintain.

There are other questions as well, and once recognized as such, the
disadvantages of traditional parser generator tools become obvious:

    4.  Otherwise unnecessary static singletons must be defined to
        give action code fragments stateful access to accumulated
        symbol table information; and either
    5.  the class representation of language symbols allowed by C++
        must be sacrificed for the macro defines or enums that are
        used to code the language symbols for most parser generator
        tools; or
    6.  the instances of the flyweight pattern that define the
        language symbols in the C++ code would duplicate the grammar
        specification, so the grammar spec must be synchronized in
        two places and two languages.

The shim's language symbols include a first-order model that defines
constants for token terminals, and function or predicate symbol rules
for compound objects.  The language also includes meta symbols as
start and other meta nonterminal symbols, and semantic action symbols
to control type checking and object construction.

These symbols provide not only the attribute grammar language, but
also process control, via the command->request map2() methods, and
output formatting, as you've already seen.  Traditional scanner and
parser generator tools don't provide these features.

There are *many* problems with traditional parser generators, and a
complete list is way outside the scope of this reply.  For more, feel
free to google for papers by Terence Parr on ll(k) parsing. 

In general, parser generators solve the interesting, easy problem of
defining parse tables from user-provided rules by providing an ugly
black-box solution, at the cost of making the hard parts of input
analysis, that is semantic analysis, actions, object construction,
and error handling, much more awkward.

> ... or some of the even more modern variants, for your parser.
If a language tool was well suited to work with multiple simple
languages at the same time, played well with C++ and its type system,
gave me complete control and flexibility in error handling, and
provided a useful attribute grammar formalism, I'd consider it.

It follows, then, that I have to say that Antlr and other work by Parr
is very interesting.  On balance, however, the practical cost of adding
dependencies for such elaborate tools to the shim is prohibitive.
It's hard enough to convince users to setup the mysql database.



More information about the ts-general mailing list