Skip to main content

LEESA: A new way of typed XML programming in C++

Some of my recent research work has focused on developing a highly generic and reusable library for complex object structure traversal, which is best exemplified by schema driven XML programming. I'm glad to present a research paper called LEESA: Embedding Strategic and XPath-like Object Structure Traversals in C++, which will be published in the proceedings of IFIP Working Conference on Domain Specific Languages (DSL WC), 2009 at Oxford, UK. LEESA stands for Language for Embedded quEry and traverSAl. LEESA has advanced the state-of-the-art of the typed XML programming in standard C++ to a level where many benefits of static type analysis can be maintained while enjoying a succinct syntax similar to that of XPath. Below, a quick motivating example of LEESA that sorts and prints the names of the authors in a XML book catalog is shown.

Catalog() >> Book() >> Author() >> Sort(Author(), LastNameComparator) >> ForEach(Author(), print);

The key thing to be noted here is that it is not a string encoded query. In fact, the C++ compiler checks the compatibility of this expression against the book catalog XML schema at compile-time! LEESA uses Expression templates idiom to achieve this highly intuitive, XPath-like syntax. Overall, LEESA's implementation is an exciting combination of generic programming, operator overloading, expression templates, C++ metaprogramming, Boost MPL, C++0x Concepts, and heck a lot of template hackery to make all the things work together! Interesting details are presented in the paper mentioned above.

The source code of LEESA is available. However, LEESA's current implementation is based on Universal Data Model (UDM 3.2.1) -- a full-fledged code generator for model-driven development that can be used as a XML schema compiler. Other code generators could be used provided they are extended to produce the necessary layers of abstraction described in the paper.

In the upcoming posts, I plan to document some of my experiences of developing LEESA.

Comments

Sudarshan said…
This is amazing. Is the source code available? It will be great if this becomes part of Boost.
Sumant said…
I would love to see that happen! However, LEESA's dependence on an external code generator might pose a limitation in its adoption as a "standard" boost library.
John Torjo said…
This looks all cool and all, but a bit of compilation times would help ;)
Usually these (compilation times) are extremely slow... And that's what bugs me...
Sudarshan said…
Could you please distribute this separately from COSMIC?
Sumant said…
The source code of LEESA is available separately. However, LEESA's current implementation is based on Universal Data Model (UDM 3.1.3) -- a full-fledged code generator for model-driven development that can be used as a XML schema compiler/code generator. Other code generators could be used provided they are extended to produce the necessary layers of abstraction described in the paper.
Sumant said…
LEESA was presented this week in the Working Conference on Domain Specific Languages 2009 in Oxford, UK. The slides are available in PPT format. The talk was followed by a great discussion and a set of very insightful questions.
Thanks for the tips, can you check out my C++ Code Samples too?
Sumant said…
A new follow-up technical report on LEESA is now available: Toward Native XML Processing Using Multi-paradigm Design in C++. It reports on the compilation times and the run-time performance of LEESA.
Sumant said…
LEESA homepage contains more resources.
piperpan said…
Looks interesting. I wonder what the rationale is for using operator>> rather than operator/ - This is XPath afterall.
Sumant said…
In XPath / is just a separator but has separate "child:", "parent:", "sibling:" to identify axes. If nothing is specified "child:" is default in XPath. So you could say "child:book/author/parent:book/child:title" in XPath.

LEESA takes a different approach based on operator overloading because that's what C++ does well. In LEESA >> means XPath's "child:", << means XPath's "parent:". Moreover LEESA supports >>= and <<= for depth-first traversal. For more information see slide #24 in http://www.dre.vanderbilt.edu/~sutambe/documents/pubs/ppt/LEESA-BoostCon.pdf

However, it is not hard to imagine a C++ library which tries to mimic XPath more closely by using syntax like "child<book>()/author()/parent<book>()/child<title>()".
Now a days using XML we can focus on developing a highly generic and reusable libraries through C++ programming.This definitely improves development in the external code generator.
Thanks for providing this good informative post and i got a good knowledge from your blog, i think most of the peoples are get good benefited from this information.
xander345 said…
if you like c++ you can compile it online here: http://codecompiler.info/

32, 64 - windows & Linux - and more programming languages

Popular Content

Multi-dimensional arrays in C++11

What new can be said about multi-dimensional arrays in C++? As it turns out, quite a bit! With the advent of C++11, we get new standard library class std::array. We also get new language features, such as template aliases and variadic templates. So I'll talk about interesting ways in which they come together.

It all started with a simple question of how to define a multi-dimensional std::array. It is a great example of deceptively simple things. Are the following the two arrays identical except that one is native and the other one is std::array?

int native[3][4];
std::array<std::array<int, 3>, 4> arr;

No! They are not. In fact, arr is more like an int[4][3]. Note the difference in the array subscripts. The native array is an array of 3 elements where every element is itself an array of 4 integers. 3 rows and 4 columns. If you want a std::array with the same layout, what you really need is:

std::array<std::array<int, 4>, 3> arr;

That's quite annoying for two r…

Inheritance vs std::variant

C++17 added std::variant and std::visit in its repertoire. They are worth a close examination. I've been wondering about whether they are always better than inheritance for modeling sum-types (fancy name for discriminated unions) and if not, under what circumstances they are not. We'll compare the two approaches in this blog post. So here it goes.

Inheritancestd::variantNeed not know all the derived types upfront (open-world assumption)Must know all the cases upfront (closed-world assumption)Dynamic Allocation (usually)No dynamic allocationIntrusive (must inherit from the base class)Non-intrusive (third-party classes can participate)Reference semantics (think how you copy a vector of pointers to base class?)Value semantics (copying is trivial)Algorithm scattered into classesAlgorithm in one placeLanguage supported (Clear errors if pure-virtual is not implemented)Library supported (poor error messages)Creates a first-class abstractionIt’s just a containerKeeps fluent interfaces…

Covariance and Contravariance in C++ Standard Library

Covariance and Contravariance are concepts that come up often as you go deeper into generic programming. While designing a language that supports parametric polymorphism (e.g., templates in C++, generics in Java, C#), the language designer has a choice between Invariance, Covariance, and Contravariance when dealing with generic types. C++'s choice is "invariance". Let's look at an example.
struct Vehicle {}; struct Car : Vehicle {}; std::vector<Vehicle *> vehicles; std::vector<Car *> cars; vehicles = cars; // Does not compile The above program does not compile because C++ templates are invariant. Of course, each time a C++ template is instantiated, the compiler creates a brand new type that uniquely represents that instantiation. Any other type to the same template creates another unique type that has nothing to do with the earlier one. Any two unrelated user-defined types in C++ can't be assigned to each-other by default. You have to provide a c…