eXist-db 2.1 Released

We're proud to announce release 2.1 of eXist-db. Most notably, this version contains some important enhancements and bug fixes in critical areas like indexing, storage backend and query processing.

There are numerous improvements to the XQuery engine and query optimizer. XQuery error detection and reporting was refined. In combination with eXide, finding and fixing errors in XQuery code has become easier than ever. Note that 2.1 may find errors which went unnoticed in 2.0.

On the feature side, changes are mostly in bundled apps. eXide, the XQuery IDE, has been updated to 2.0. Next to a visual redesign it provides fullscreen mode, XQuery refactorings, quick fixes, code snippets, hints and much more.  See the webcast for more information.

A detailed change log can be found on the download page. eXist-db 2.1 is binary compatible with 2.0. Data files created by 2.0 can be read into 2.1.

Adding a Map Datatype to XQuery

Introduction

The "standard" way of passing a complex data structure in XQuery is to create an XML fragment and later query it using XPath. This approach works well most of the time, but sometimes you just can't use it:

  • wrapping stored data into an XML fragment will create a new copy of the data in memory (even though eXist-db will defer this until the data is actually used). Wrapping a large query result into an XML structure may thus use a considerable amount of memory.
  • the reference to the original document gets lost.
  • the power of higher-order functions in XQuery 3.0 makes me wish I could create data structures containing function items as values.

Maps provide a solution to the problems above. Michael Kay has posted a well thought out proposal for maps, which I decided to implement a few weeks ago.

Let's have a quick look at the map datatype as proposed by Michael and implemented in the current trunk of eXist-db. Note that this is not part of the XQuery 3.0 specification - though it is considered for later inclusion - and may be subject to change.

Creating a Map

You create a new map through either the literal syntax or the functions map:new and map:entry. Here's the literal syntax:

let $daysOfWeek := map { "Sunday" := 1, "Monday" := 2, "Tuesday" := 3, "Wednesday" := 4, "Thursday" := 5, "Friday" := 6, "Saturday" := 7 }

The keys are arbitrary atomic values while any sequence can be used as value. You are thus not limited to string keys: dates, numbers or QNames will work as well. Keys are compared for equality using the eq operator under the map's collation.

map:entry creates a map with a single key/value pair. Use this to create map items programmatically in combination with map:new (see map:new below):

map:entry("Sunday", 1)

map:new creates either an empty map or a new map from a sequence of maps. It accepts an optional collation string as second parameter:

let $daysOfWeek := ( "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday" ) let $map := map:new( for $day at $pos in days return map:entry($day, $pos), "?strength=primary" )

As you can see, the only way to create a map from a sequence programmatically is to merge single-item maps into the new map. The map implementation in eXist-db makes sure this is not too expensive (by using a lightweight wrapper for single key/value pairs).

In this example, the collation string "?strength=primary" causes keys to be compared in a case-insensitive way.

Look Up

To look up a key, use map:get:

map:get($map, "Tuesday")

But wait, there's a real cool shortcut to do a look up: a map is also a function item, which means you can directly call it as a function, passing the key to retrieve as single parameter:

$map("Tuesday")

Calling the map as a function item otherwise just behaves like map:get.

Because the empty sequence is allowed as a value, map:get does not tell you for sure if a key exists in a map or not. You can use map:contains to see if a key is present in the map:

map:contains($map, "Tuesday")

map:keys retrieves all keys in the map as a sequence:

map:keys($map)

Please note that the order in which keys are returned is implementation-defined, so don't rely on it. In fact, eXist-db uses two different map implementations for better performance, depending on collation settings and key types.

Here's a complete example which combines the functions to access a map:

xquery version "1.0"; let $workDays := map { "Monday" := 2, "Tuesday" := 3, "Wednesday" := 4, "Thursday" := 5, "Friday" := 6 } let $daysOfWeek := map:new(($workDays, map { "Sunday" := 1, "Saturday" := 7 })) for $day in map:keys($daysOfWeek) order by map:get($daysOfWeek, $day) return <day n="{$daysOfWeek($day)}" atWork="{map:contains($workDays, $day)}">{$day}</day>
Edit

Maps are Immutable

To remove a key/value pair, call

let $newMap := map:remove("Sunday")

At this point we definitely need to talk about an important feature: maps are immutable! Adding or removing a key/value pair will result in a new map. To illustrate this with an example:

let $daysOfWeek := map { "Sunday" := 1, "Monday" := 2, "Tuesday" := 3, "Wednesday" := 4, "Thursday" := 5, "Friday" := 6, "Saturday" := 7 } let $workDays := map:remove($daysOfWeek, "Sunday") return ( map:contains($daysOfWeek, "Sunday") (: Still there :), map:contains($workDays, "Sunday") (: Nope :) )
Edit

Internally, eXist-db uses an efficient implementation of persistent immutable maps and hash tables taken from clojure , another lisp-like, functional language for the Java VM.

Use Cases

So far I found maps to be useful in a number of scenarios:

  1. in my HTML templating framework for passing around application data between templates. In this case the sequences stored in the map can potentially be very large, e.g. if they include the result of queries into the database. Wrapping the data into an in-memory fragment would thus be a bad idea.
  2. to pass optional configuration parameters into a library module.
  3. to introduce additional levels of abstraction when working with heterogeneous data sets.

Function Items as Values

To understand the last scenario, we have to take a closer look at an important feature of maps: one can use function items as map values! For example, a library module may allow the calling module to register an optional function for resolving a resource, which only the calling module can know how to find:

let $configuration := map { "resolve": function($relPath as xs:string) { (: resolve resource :) } }

You can even use maps and function items to simulate "objects". For example, one of my library modules has to display a short summary of documents using two different schemas: docbook and TEI. It thus needs to extract common metadata like title or author from the documents. Using maps, I could create a wrapper around the documents, which provides functions to access the data in object-oriented style:

xquery version "3.0"; declare namespace tei="http://www.tei-c.org/ns/1.0"; declare namespace db="http://docbook.org/ns/docbook"; declare function local:tei($root as element()) as map(xs:string, function(*)) { map { "title" := function() as xs:string { $root//tei:teiHeader/tei:fileDesc/tei:titleStmt/tei:title/string() } } }; declare function local:docbook($root as element()) as map(xs:string, function(*)) { map { "title" := function() as xs:string { $root//db:info/db:title/string() } } }; declare function local:wrap($root as element()) as map(xs:string, function(*))? { typeswitch ($root) case element(tei:TEI) return local:tei($root) case element(db:article) return local:docbook($root) default return () }; <ul> { for $doc in (doc("/db/db-test.xml")/*, doc("/db/tei-test.xml")/*) let $wrapped := local:wrap($doc) return <li>{$wrapped("title")()}</li> } </ul>

This approach has its limitations. There's no guarantee that the maps returned by local:wrap do indeed have a "title" function. XQuery is not - and was not designed to be - an object-oriented language. However, I can see that the technique could improve reusability of code libraries.

Availability

Maps as a data type are currently available in eXist-db trunk and will likely go into the final 2.0 release (only minor additions to the query engine were required). If you would like to test them right now, feel free to check out trunk.

Higher-Order Functions in XQuery 3.0

Introduction

Higher-order functions are probably the most notable addition to the XQuery language in version 3.0 of the specification. While it may take some time to understand their full impact, higher-order functions certainly open a wide range of new possibilities, and are a key feature in all functional languages.

As of April 2012, eXist-db completely supports higher-order functions, including features like inline functions, closures and partial function application. This article will quickly walk through each feature before we put them all together in a practical example.

Function References

A higher-order function is a function which takes another function as parameter or returns a function. So the first thing you'll need in order to pass a function around is a way to obtain a reference to a function.

In older versions of eXist-db we had an extension function for this, called util:function, which expected a name as first argument, and the arity of the function as second. The arity corresponds to the number of parameters the target function takes. Name and arity are required to uniquely identify a function within a module.

XQuery 3.0 now provides a literal syntax for referencing a function statically. It also consists of the name and the arity of the function to look up, separated by a hash sign:

let $f := my:func#2

This assumes you already know the function when writing the query. However, there's also a dynamic variant, a function called "function-lookup", which is part of the standard library:

let $f := function-lookup("my:func", 2)

Dynamic Function Calls

Now we have a reference, we can actually call the function. In XQuery 3.0, a dynamic function call is a primary expression followed by an argument in parenthesis:

let $f := upper-case#1 return $f("Hello")

In this case the function reference is obtained from variable $f, but we could actually use any primary expression here, even another function call. Now we're able to write our first higher-order function, i.e. one which takes another function as parameter:

xquery version "3.0"; declare namespace ex="http://exist-db.org/xquery/ex"; declare function ex:map($func, $list) { for $item in $list return $func($item) }; (: Create an inline function and assign it to $f :) let $f := upper-case#1 return ex:map($f, ("Hello", "world!"))
Edit

Our function, ex:map, simply applies the passed function to every item in the list it received. This is actually a very common operation, so the XQuery 3.0 standard library does already provide a function, fn:map, and we don't need to write our own.

Inline Functions and Closures

Instead of explicitly declaring a function in the module, we may also use an inline function (or anonymous function), which we can directly pass around. To put it simple: an inline function is a function without a name:

map(function($x) { $x * $x }, 1 to 5)
Edit

Inline functions have an important feature: within the function body, you can not only access the function parameters, but also any variable which was in scope at the time the inline function was created. This is commonly called a closure in functional languages. The closure allows the function body to access variables outside its immediate lexical scope. We'll see concrete examples in the larger example following below.

Inline functions also inherit the static context from the module in which they were created.

Partial Function Applications

Partial function applications are a powerful feature in functional programming languages. They allow us to fill in parts of a function call, leaving some parameters open. This is useful if information has to be passed across a hierarchy of function calls. The calling function can fill in some parameters and leaves it to the called function to provide the rest.

A partial function application looks like a normal (dynamic or static) function call using a question mark (?) as placeholder for some parameters:

declare function local:add($lhs as xs:integer, $rhs as xs:integer) as xs:integer { $lhs + $rhs }; let $f := local:add(10, ?) return $f(32)
Edit

The partial function application actually creates a new function with the same name, but removing all fixed parameters from the parameter list.

Putting it all Together

For a practical exercise we choose a frequent task when working with eXist: scan a collection hierarchy recursively and apply some operation to every collection or resource encountered. This can be written as a one-time solution using a simple recursive function. However, we would rather like to design a set of functions which can be reused for arbitrary tasks on collections and resources.

To start with, we create a function to recursively scan the collection hierarchy, calling a provided function for each collection encountered:

xquery version "3.0"; module namespace dbutil="http://exist-db.org/xquery/dbutil"; (:~ : Scan a collection tree recursively starting at $root. Call : $func once for each collection found :) declare function dbutil:scan-collections($root as xs:anyURI, $func as function(xs:anyURI) as item()*) { $func($root), for $child in xmldb:get-child-collections($root) return dbutil:scan-collections(xs:anyURI($root || "/" || $child), $func) };

The function takes two parameters:

  • the path to the root collection
  • a function which accepts one parameter: the path to the current collection

The callback function is called once for every collection we encounter while scanning the collection tree. We then walk each child collection, calling dbutil:scan-collections recursively. The || operator is also new in XQuery 3.0 and concatenates strings. It is a shorthand for the fn:concat function.

Next, let's add another function for scanning all the resource paths stored in a given collection and pass them to another callback function. This is straightforward:

(:~ : List all resources contained in a collection and call the : supplied function once for each resource with the complete : path to the resource as parameter. :) declare function dbutil:scan-resources($collection as xs:anyURI, $func as function(xs:anyURI) as item()*) { for $child in xmldb:get-child-resources($collection) return $func(xs:anyURI($collection || "/" || $child)) };

We're now ready to combine the two operations: create a function which recursively walks the collection tree and calls a provided function for each collection and resource encountered.

(:~ : Scan a collection tree recursively starting at $root. Call : the supplied function once for each resource encountered. : The first parameter to $func is the collection URI, the : second the resource path (including the collection part). :) declare function dbutil:scan($root as xs:anyURI, $func as function(xs:anyURI, xs:anyURI?) as item()*) { dbutil:scan-collections($root, function($collection as xs:anyURI) { $func($collection, ()), (: scan-resources expects a function with one parameter, so we use a partial application to fill in the collection parameter :) dbutil:scan-resources($collection, $func($collection, ?)) }) };

The callback function for the collection/resource scan takes two parameters:

  • the path to the collection
  • the full resource path if the current item is a resource, or the empty sequence if it is a collection

We start by calling dbutil:scan-collections with an inline function. This gets the path to the current collection and first calls the user provided callback with the collection path and an empty sequence for the resource parameter. Note that the variable $func, which we received as parameter to the outer call, is visible within the inline function due to the closure!

Now to the interesting code: for each collection, we call dbutil:scan-resources. Remember that it expects a callback function with just one parameter (the resource path), but the user-supplied $func takes two parameters. We could now work around this by changing dbutil:scan-resources, but there's an easier way: we pass a partial function application, providing $collection as a fixed parameter while leaving the resource parameter open!

We now have all our library functions in place and are ready to use them. For a first test, here's a function which scans the collection hierarchy for resources having a given mime type:

declare function dbutil:find-by-mimetype($collection as xs:anyURI, $mimeType as xs:string) { dbutil:scan($collection, function($collection, $resource) { if (exists($resource) and xmldb:get-mime-type($resource) = $mimeType) then $resource else () }) };

If our inline function receives a $resource parameter, we check its mime type and return the resource path if it matches. In all other cases, the empty sequence is returned. We can now test if our library works as expected:

import module namespace dbutil="http://exist-db.org/xquery/dbutil" at "xmldb:exist:///db/codebin/dbutils.xql"; dbutil:find-by-mimetype(xs:anyURI("/db/demo"), "application/xquery")
Edit

You can view the code for the complete module.

Another operation you will probably need when migrating your database from eXist-db version 1.4.x to the forthcoming 2.0: traverse the collection tree and fix permissions on all collections and XQuery resources. Contrary to previous versions, eXist-db 2.0 closely follows the Unix security model. To access a collection, it is no longer sufficient to have read permissions on it: you also need the execute permission in order to see the collection contents. Likewise, stored XQuery modules need the execute flag to be set in order to be executed through eXist-db.

Here's a piece of code to globally reset permissions:

xquery version "3.0"; import module namespace dbutil="http://exist-db.org/xquery/dbutil" at "xmldb:exist:///db/codebin/dbutils.xql"; (: You need to run this as an admin user, so it won't work on the demo server :) let $perms := "g+x,u+x,o+x" return dbutil:scan(xs:anyURI("/db"), function($collection, $resource) { if ($resource and xmldb:get-mime-type($resource) = "application/xquery") then sm:chmod($resource, $perms) else sm:chmod($collection, $perms) })
Edit

Availability and Backwards Compatibility

Complete support for XQuery 3.0 higher-order functions is available in eXist-db trunk (2.1dev) starting with revision 16248 (April 14, 2012). It will also be part of the final 2.0 release. We're currently working hard to complete the last bits and pieces for the release.

eXist-db provided ways to dynamically call functions since several years, based on the two extension functions util:call and util:function. They are used in several modules (e.g. the KWIC module).

We made sure the new XQuery 3.0 higher-order function implementation is backwards compatible (though a lot more powerful), so old code will continue to work and migration is smooth.

1.4.2 Maintenance Release

While we're working hard on polishing version 2.0 for the final release, we should not forget that there have been quite a few important bug fixes in the 1.4.x branch as well. We are thus happy to announce another production quality release in the stable branch: 1.4.2. Version 1.4.2 is the last release in the 1.4.x branch before 2.0. After 2.0 has been released in May, it will become the new stable branch.

Stable releases only contain hand-selected bug fixes which were ported back from the development trunk. There are no new features. Notable changes include:

  • Deadlock fixes
  • Lots of updates to the rewritten WebDAV interface
  • Performance improvements when using the Lucene index
  • Backup to file system produced bad directory structure
  • Support circular module imports
  • XQuery processing fixes

eXist-db version 1.4.2 is now available on sourceforge.

One of the outstanding new features in the development version of eXist is the ability to index and query the content of binary resources, which are stored inside the database. This is made possible through a new XQuery module for content extraction and some basic, but powerful additions to the Lucene-based full text indexing. Our goal was to provide as much flexibility to the user as possible. The new functions are thus rather generic and can be used in a wide range of scenarios.

Read article ...