Introduction
Higher-order functions are probably the most notable addition to the XQuery language in version 3.0 of the specification. While it may take some time to understand their full impact, higher-order functions certainly open a wide range of new possibilities, and are a key feature in all functional languages.
As of April 2012, eXist-db completely supports higher-order functions, including features like inline functions, closures and partial function application. This article will quickly walk through each feature before we put them all together in a practical example.
Function References
A higher-order function is a function which takes another function as parameter or returns a function. So the first thing you'll need in order to pass a function around is a way to obtain a reference to a function.
In older versions of eXist-db we had an extension function for this, called util:function, which expected a name as first argument, and the arity of the function as second. The arity corresponds to the number of parameters the target function takes. Name and arity are required to uniquely identify a function within a module.
XQuery 3.0 now provides a literal syntax for referencing a function statically. It also consists of the name and the arity of the function to look up, separated by a hash sign:
This assumes you already know the function when writing the query. However, there's also a dynamic variant, a function called "function-lookup", which is part of the standard library:
Dynamic Function Calls
Now we have a reference, we can actually call the function. In XQuery 3.0, a dynamic function call is a primary expression followed by an argument in parenthesis:
let $f := upper-case#1 return $f("Hello")
In this case the function reference is obtained from variable $f, but we could actually use any primary expression here, even another function call. Now we're able to write our first higher-order function, i.e. one which takes another function as parameter:
Our function, ex:map, simply applies the passed function to every item in the list it received. This is actually a very common operation, so the XQuery 3.0 standard library does already provide a function, fn:map, and we don't need to write our own.
Inline Functions and Closures
Instead of explicitly declaring a function in the module, we may also use an inline function (or anonymous function), which we can directly pass around. To put it simple: an inline function is a function without a name:
Inline functions have an important feature: within the function body, you can not only access the function parameters, but also any variable which was in scope at the time the inline function was created. This is commonly called a closure in functional languages. The closure allows the function body to access variables outside its immediate lexical scope. We'll see concrete examples in the larger example following below.
Inline functions also inherit the static context from the module in which they were created.
Partial Function Applications
Partial function applications are a powerful feature in functional programming languages. They allow us to fill in parts of a function call, leaving some parameters open. This is useful if information has to be passed across a hierarchy of function calls. The calling function can fill in some parameters and leaves it to the called function to provide the rest.
A partial function application looks like a normal (dynamic or static) function call using a question mark (?) as placeholder for some parameters:
The partial function application actually creates a new function with the same name, but removing all fixed parameters from the parameter list.
Putting it all Together
For a practical exercise we choose a frequent task when working with eXist: scan a collection hierarchy recursively and apply some operation to every collection or resource encountered. This can be written as a one-time solution using a simple recursive function. However, we would rather like to design a set of functions which can be reused for arbitrary tasks on collections and resources.
To start with, we create a function to recursively scan the collection hierarchy, calling a provided function for each collection encountered:
The function takes two parameters:
- the path to the root collection
- a function which accepts one parameter: the path to the current collection
The callback function is called once for every collection we encounter while scanning the collection tree. We then walk each child collection, calling dbutil:scan-collections recursively. The || operator is also new in XQuery 3.0 and concatenates strings. It is a shorthand for the fn:concat function.
Next, let's add another function for scanning all the resource paths stored in a given collection and pass them to another callback function. This is straightforward:
We're now ready to combine the two operations: create a function which recursively walks the collection tree and calls a provided function for each collection and resource encountered.
The callback function for the collection/resource scan takes two parameters:
- the path to the collection
- the full resource path if the current item is a resource, or the empty sequence if it is a collection
We start by calling dbutil:scan-collections with an inline function. This gets the path to the current collection and first calls the user provided callback with the collection path and an empty sequence for the resource parameter. Note that the variable $func, which we received as parameter to the outer call, is visible within the inline function due to the closure!
Now to the interesting code: for each collection, we call dbutil:scan-resources. Remember that it expects a callback function with just one parameter (the resource path), but the user-supplied $func takes two parameters. We could now work around this by changing dbutil:scan-resources, but there's an easier way: we pass a partial function application, providing $collection as a fixed parameter while leaving the resource parameter open!
We now have all our library functions in place and are ready to use them. For a first test, here's a function which scans the collection hierarchy for resources having a given mime type:
If our inline function receives a $resource parameter, we check its mime type and return the resource path if it matches. In all other cases, the empty sequence is returned. We can now test if our library works as expected:
You can view the code for the complete module.
Another operation you will probably need when migrating your database from eXist-db version 1.4.x to the forthcoming 2.0: traverse the collection tree and fix permissions on all collections and XQuery resources. Contrary to previous versions, eXist-db 2.0 closely follows the Unix security model. To access a collection, it is no longer sufficient to have read permissions on it: you also need the execute permission in order to see the collection contents. Likewise, stored XQuery modules need the execute flag to be set in order to be executed through eXist-db.
Here's a piece of code to globally reset permissions:
Availability and Backwards Compatibility
Complete support for XQuery 3.0 higher-order functions is available in eXist-db trunk (2.1dev) starting with revision 16248 (April 14, 2012). It will also be part of the final 2.0 release. We're currently working hard to complete the last bits and pieces for the release.
eXist-db provided ways to dynamically call functions since several years, based on the two extension functions util:call and util:function. They are used in several modules (e.g. the KWIC module).
We made sure the new XQuery 3.0 higher-order function implementation is backwards compatible (though a lot more powerful), so old code will continue to work and migration is smooth.