XQuery in eXist-db

(1Q18)


eXist-db provides strong support for the W3C's family of recommendations for the XQuery language. eXist-db also supports a rich family of extension functions and capabilities, enabling XQuery developers to create powerful applications.

This document is intended for advanced developers and explains eXist-db's implementation of XQuery. For readers who are new to XQuery or programming in general we recommend you start with the resources listed in Learning XQuery with eXist-db or Getting Started with Web Application Development.

Current Status of XQuery Support

eXist-db implements XQuery 3.1: An XML Query Language and "XPath and XQuery Functions and Operators 3.1", with the exception of certain unsupported features and functions. For complete documentation on functions in eXist-db, see XQuery Function Documentation.

eXist-db's XQuery implementation has been tested against the official XML Query Test Suite (XQTS). An updated test suite runner to test conformance against the official XQuery/XPath/XSLT 3.* Test Suite (QT3) is being planned.

Supported Optional Features

In addition to the standard features, eXist-db provides extended support for modules .

It implements the full axis feature, providing support for the following optional axes: ancestor::, ancestor-or-self::, following::, following-sibling::, preceding:: and preceding-sibling::. The only optional axis not supported is the namespace axis.

Unsupported features

eXist-db supports strong typing whenever the expected type of an expression, function argument, or function return value is explicitly specified or can be known otherwise. However, the following schema- and data type-related features are not supported:

  • schema import and schema validation features (defined as optional in the XQuery specification). Instead, eXist-db provides a validation module containing extension functions to perform XML validation.

  • When storing XML documents, eXist-db does not store type information along with the nodes. It therefore doesn't know the typed value of a node and has to assume xs:untypedAtomic (as defined by the XQuery specification).

  • eXist-db does not support specifying a data type in an element or attribute test. The node test element(test-node) is supported, but the type portion of the test element(test-node, xs:integer) is ignored.

  • eXist supports all datatypes except xs:dateTimeStamp

eXist-db supports standard XQuery FLWOR clause constructs except for the following:

  • "tumbling window" and "sliding window" clauses

  • "count" clause

  • "allowing empty" clause

eXist-db supports all standard XQuery functions except for the following:

  • array:put#3

  • array:sort#1, #2, #3

  • fn:collation-key#1, #2

  • fn:contains-token#2, #3

  • fn:default-language#0

  • fn:document-uri#0

  • fn:element-with-id#1, #2

  • fn:format-integer#2, #3

  • fn:json-to-xml#1, #2

  • fn:nilled#0

  • fn:parse-ietf-date#1

  • fn:path#0, #1

  • fn:random-number-generator#0, #1

  • fn:round#2

  • fn:trace#1

  • fn:transform#1

  • fn:uri-collection#0, #1

  • fn:xml-to-json#1, #2

  • map:find#2

  • map:merge#2

eXist-db supports all standard serialization parameters except the following:

  • item-separator

Full Text Search

eXist-db has an implementation-specific Full Text Search facility, built on the Apache Lucene library (see the full documentation on indexing in eXist-db). eXist-db does not currently support the official W3C "XQuery and XPath Full Text 3.0" specification.

XQuery Update

eXist-db has an implementation-specific XQuery Update facility, which predates the development of the official W3C XQuery Update facility. eXist-db does currently not support the official "XQuery Update Facility 3.0" specification. The main difference between the eXist-db implementation and the official specification is that the eXist-db implementation supports in-place updates. Switching to the W3C recommendation would break backwards compatibility.

Function Library

A complete list of XQuery functions supported by eXist-db can be found in the XQuery Function Documentation.

Each module's documentation is generated from a different sources, depending on whether the module is implemented in Java or XQuery. For modules implemented in Java, the documentation is taken directly from the signature provided by the class implementing the Function interface. For modules implemented in XQuery, the function descriptions are taken from XQDoc-formatted comments and annotations.

The Module System

eXist-db supports the creation of web applications in XQuery. This can result in complex XQuery scripts, consisting of several thousand lines of code. Packaging related functions in modules is therefore an important feature. eXist-db allows modules to be imported from a variety of sources:

  • an URI

  • a collection in the database

  • a jar file, usually a Java archive

  • a Java class, if the module is implemented in Java itself

For example, a typical import statement in an XQuery will look like this:

import module namespace status="http://exist-db.org/xquery/admin-interface/status" at "http://exist-db.org/modules/test.xqm";

Provided the module namespace does not point to one of the preloaded standard modules (see below), the query engine will try to locate the module source by looking at the URI given after the at keyword. In the example above, the module was specified using a full URI and the query engine will attempt to load the module source from there.

However, the module could also be stored in a database collection:

import module namespace status="http://exist-db.org/xquery/admin-interface/status" at "xmldb:exist:///db/modules/test.xqm";

The query engine recognizes the module is stored in the local database instance and tries to load it from there.

If the XQuery module is part of a Java application, it is also an option to pack the module into a Java archive (.jar file) along with the Java classes. Then use the following import to load it:

import module namespace status="http://exist-db.org/xquery/admin-interface/status" at "resource:org/exist/xquery/lib/test.xqm";

Finally, XQuery modules can also be implemented in Java (see below).You can import them by specifying the class path of the Module class:

import module namespace xdiff="http://exist-db.org/xquery/xmldiff" at "java:org.exist.xquery.modules.xmldiff.XmlDiffModule";

The extensions/modules directory in the eXist-db distribution contains a number of useful modules. These can also serve as examples for implementing your own.

Using Relative URIs

If the location specified in an import statement is a relative URI, the query engine tries to load the module relatively to the current module load path. The module load path is determined as follows:

  1. If the main XQuery was loaded from a database collection, the module load path is the URI of that collection. For example if you access an XQuery via the REST server:

    http://localhost:8080/exist/servlet/db/modules/test.xq

    Now all relative module paths will be resolved relative to the /db/modules collection.

  2. If the main XQuery was retrieved from the file system, the module load path points to that directory. This applies to queries executed through the XQueryServlet, XQueryGenerator or the Java admin client.

Preloaded Modules

Preloaded modules do not need to be explicitly imported or declared in the prolog of queries. The <builtin-modules> element in conf.xml lists the namespaces and the corresponding Java class that implements all modules to be preloaded:

<xquery enable-java-binding="no"> <builtin-modules> <module uri="http://exist-db.org/xquery/util" class="org.exist.xquery.functions.util.UtilModule"/> <module uri="http://exist-db.org/xquery/transform" class="org.exist.xquery.functions.transform.TransformModule"/> </builtin-modules> </xquery>

EXPath Packages

The eXist-db package repository makes it easy to manage and deploy external packages (.xar archives) containing Java or XQuery libraries and even full eXist-db applications. See the Package Repository documentation.

XQuery Caching

XQuery modules executed via the REST interface, the XQueryServlet or XQueryGenerator are automatically cached. The compiled expression will be added to an internal pool of prepared queries. The next time a query or module is loaded from the same location, the already compiled code is reused. The code will only be recompiled if eXist-db decides that the source was modified or when it wasn't used for a longer period of time.

If a query is accessed by more than one thread concurrently, each new thread will create a new copy of the compiled query. The copies will be added to the query pool until it reaches a pre-defined limit.

Modules are cached along with the main query that imported them.

Calling Java Methods from XQuery

eXist-db supports calls to arbitrary Java methods from within XQuery. The binding mechanism follows the short-cut technique introduced by Saxon.

The class of the external function is identified by the namespace URI of the function call. This namespace URI must start with the prefix java:, followed by the fully qualified class name of the class. For example, the following code snippet calls the static method sqrt (square-root) of class java.lang.Math:

declare namespace math="java:java.lang.Math"; math:sqrt(2)

If the function name contains a hyphen, the letter following the hyphen is converted to upper-case and the hyphen is removed (to conform to the Java CamelCase naming convention). So for example, to-string() will call the Java method toString().

If more than one method in the class matches the given name and parameter count, eXist-db tries to select the method that best fits the passed parameter types at runtime.

The result of the method call can be assigned to an XQuery variable. If possible, it will be mapped to the corresponding XML schema type. Otherwise, it's type is the eXist-db built-in type object.

Java constructors are called using the function new. Again, a matching constructor is selected by looking at the parameter count and types. The returned value is a new Java object with the built-in type object.

Instance methods are called by supplying a valid Java object as first parameter. The Java object has to be an instance of the given class. For example, the following snippet lists all files and directories in the current directory:

declare namespace file="java:java.io.File"; <files> { for $f in file:list-files( file:new(".") ) let $n := file:get-name($f) order by $n return if (file:is-directory($f)) then <directory name="{ $n }"/> else <file name="{ $n }" size="{ file:length($f) }"/> } </files>

For security reasons, the Java binding is disabled by default. To enable it, the attribute enable-java-binding in the central configuration file has to be set to yes:

<xquery enable-java-binding="yes">

Enabling the Java binding bears some risks: if you allow users to directly pass XQuery code to the database, for instance through the sandbox application, they can use Java methods to inspect your system or execute potentially destructive code on the server.

Creating XQuery Modules

eXist-db supports XQuery library modules, collections of function definitions and global variable declarations. eXist-db supports two types: External Modules, written in XQuery, and Internal Modules, implemented in Java. The standard XPath/XQuery functions and all standard eXist-db extension functions are implemented as internal modules. This section describes how to create XQuery modules using XQuery and Java.

Creating Modules in XQuery

You can declare an XQuery file as a module and import it using the standard import module directive. The XQuery engine imports each module only once during compilation. The compiled module is made available through the static XQuery context.

Creating Modules in Java

To register Java-based XQuery modules, eXist-db requires a namespace URI to identify it and the list of functions it supplies. For this, you need to pass a driver class to the XQuery engine. This class must implement the interface org.exist.xpath.InternalModule.

Besides the basic methods for creating a Java-based XQuery module described here, eXist-db provides a pluggable module interface that allows extension modules to be easily developed in Java. See XQuery Extension Modules for the full documentation on this eXist-db development best practice. Also, see the Package Repository documentation for information on distributing and deploying XQuery and Java libraries for eXist-db.

The class org.exist.xpath.AbstractInternalModule already provides an implementation skeleton. The class constructor expects an array of function definitions for all functions to registered. A function definition (class FunctionDef) has two properties: the static signature of the function (as an instance of FunctionSignature) and the Java Class that implements the function.

A function is a class extending org.exist.xquery.Function or org.exist.xquery.BasicFunction. Functions without special requirements (like overloading) should subclass BasicFunction. To illustrate this a simple function definition:

public class EchoFunction extends BasicFunction { public final static FunctionSignature signature = new FunctionSignature( new QName("echo", ExampleModule.NAMESPACE_URI, ExampleModule.PREFIX), "A useless example function. It just echoes the input parameters.", new SequenceType[] { new FunctionParameterSequenceType("text", Type.STRING, Cardinality.ZERO_OR_MORE, "The text to echo") }, new FunctionReturnSequenceType(Type.STRING, Cardinality.ZERO_OR_MORE, "the echoed text")); public EchoFunction(XQueryContext context) { super(context, signature); } public Sequence eval(Sequence[] args, Sequence contextSequence) throws XPathException { // is argument the empty sequence? if (args[0].getLength() == 0) return Sequence.EMPTY_SEQUENCE; // iterate through the argument sequence and echo each item ValueSequence result = new ValueSequence(); for (SequenceIterator i = args[0].iterate(); i.hasNext();) { String str = i.nextItem().getStringValue(); result.add(new StringValue("echo: " + str)); } return result; } }
  • Note that every function class has to provide a function signature. This defines the QName by which the function is identified, a documentation string, the sequence types of all arguments and the sequence type of the returned value.

    In the example above, we accept a single argument named "text" of type xs:string and a cardinality of ZERO_OR_MORE with the description "The text to echo". In other words, we accept any sequence of strings containing zero or more items. The return value is of type xs:string and a cardinality of ZERO_OR_MORE with the description "the echoed text".

    A parameter description should be normal sentence starting with a capital letter. The return value description is always prefixed with "Returns ", so make sure the text matches.

  • The subclass overwrites the eval method, which has two arguments: the first contains the values of all arguments passed to the function, the second passes the current context sequence (which might be null).

    Note that the argument values in the array args have already been checked to match the sequence types defined in the function signature. We therefore do not have to recheck the length of the array: if more or less than one argument were passed to the function, an exception would have been thrown before eval gets called.

    In XQuery, all values are passed as sequences. A sequence consists of one or more items, and every item is either an atomic value or a node. Furthermore, a single item is also a sequence. The function signature specifies that any sequence containing zero or more strings is acceptable for our method. We therefore have to check if the empty sequence has been passed. Here the function call returns immediately. Otherwise, we iterate through each item in the sequence, prefix echo: to its string value, and add it to the result sequence.

Now we want to add the function to a new module, so we have to provide a driver class. The driver class defines the namespace URI and default prefix for the module. Functions are registered by passing an array of FunctionDef to the constructor. The following is an example driver class definition:

public class ExampleModule extends AbstractInternalModule { public final static String NAMESPACE_URI = "http://exist-db.org/xquery/examples"; public final static String PREFIX = "example"; private final static FunctionDef[] functions = { new FunctionDef(EchoFunction.signature, EchoFunction.class) }; public ExampleModule() { super(functions); } public String getNamespaceURI() { return NAMESPACE_URI; } public String getDefaultPrefix() { return PREFIX; } }

To use this in XQuery:

xquery version "1.0"; import module namespace example="http://exist-db.org/xquery/examples" at "java:org.exist.examples.xquery.ExampleModule"; example:echo(("Hello", "World!"))

The query engine recognizes the java: prefix in the location URI and treats the remaining part org.exist.examples.xquery.ExampleModule as a fully qualified class name leading to the driver class of the module.

Collations

Collations are used to compare strings in a locale-sensitive fashion. XQuery allows to specify collations by means of a collation URI. For example, a collation can be specified in the order by clause of a XQuery FLWOR expression as well as in string-related functions.

eXist-db recognizes the following standard collation URIs:

http://www.w3.org/2005/xpath-functions/collation/codepoint

This URI selects the Unicode Codepoint Collation. This is default if no collation is specified. eXist implements this using standard Java comparison and string search functions.

http://www.w3.org/2013/collation/UCA

This URI selects the standard Unicode Collation Algorithm (UCA), which accepts all standard parameters. eXist implements this using the ICU4J library.

http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive

This URI selects the standard HTML ASCII Case-Insensitive Collation.

In addition, eXist-db supports an implementation-specific URI:

http://exist-db.org/collation?lang=xxx&strength=xxx&decomposition=xxx

The lang parameter selects a locale, and should have the same form as in xml:lang. For example, "de" or "de-DE" to select the German locale.

The optional strength parameter must be one of primary, secondary, tertiary or identical.

The optional decomposition parameter (optional) has the value of none, full or standard.

Instead of supplying the full URI, an abbreviated URI can be supplied instead: ?lang=xxx&strength=xxx&decomposition=xxx

The following example selects the German locale for sorting:

for $w in ("das", "daß", "Buch", "Bücher", "Bauer", "Bäuerin", "Jagen", "Jäger") order by $w collation "?lang=de-DE" return $w

It returns the following:

Bauer, Bäuerin, Buch, Bücher, das, daß, Jagen, Jäger

You can change the default collation in the XQuery prolog:

declare default collation "?lang=de-DE"; "Bäuerin" < "Bier"

This now returns true (the default collation would have returned false).

Finally, you can also use Java class specified collators. They should be subclasses of java.text.RuleBasedCollator. For example:

for $w in ("das", "daß", "Buch", "Bücher", "Bauer", "Bäuerin", "Jagen", "Jäger") order by $w collation "java:f.c.d.n.MyRuleBasedGermanCollator" return $w

The .jar with the .class file(s) of the collator needs to be in ${EXIST_HOME}/lib/user

Serialization

Serialization is the transformation of query results into a binary stream. eXist-db implements the "XSLT and XQuery Serialization 3.1" specification. For example, the following serialization declaration instructs eXist-db to serialize query results as JSON and to supply a JSON media-type:

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization"; declare option output:method "json"; declare option output:media-type "application/json";

As this example shows, serialization parameters can be set within the query itself. However, the interpretation of serialization parameters depends on the context in which the query is called. For instance, the JSON media-type declaration in the example above in only relevant in the context of an HTTP response header. Most output parameters are applicable only if the query is executed using the XQueryGenerator/XQueryServlet servlet or the REST server.

Before serialization parameters were standardized in XQuery 3.0, eXist-db developed its own set of parameters, as described in below. These legacy parameters remain supported for backwards compatibility. For example, the eXist-db legacy approach to the JSON serialization declaration above is:

declare option exist:serialize "method=json media-type=application/json";

Most of the legacy serialization parameters have an equivalent in the standard serialization parameters, but others are truly specific to eXist-db, and are thus accessible only using the legacy serialization approach.

eXist-db legacy serialization parameters

Legacy serialization parameters can be set with a declare option statement in the query prolog. For instance:

declare option exist:serialize "method=xhtml media-type=application/xhtml+html";

The option's QName must be exist:serialize. The exist prefix is bound to the namespace http://exist.sourceforge.net/NS/exist. This is declared by default and need not be specified explicitly.

Options are specified within the option's string literal, separated by whitespace.

General options:

method= xml | xhtml | json | text

Determines the serialization method. Should be one of xml, xhtml, json or text.

The xhtml method makes sure that XHTML elements with an empty content model are serialized in minimized form. For instance <img> will be output as <img/>.

Unlike the xml method, the xhtml setting uses only the short form for the elements declared empty in the xhtml DTD. For example, the <br> tag is always returned as <br/>.

The text method returns the contents of elements only. For instance, <A>Content</A> is returned as Content. Attribute values, processing instructions and comments are ignored.

jsonp= myFunctionName

Only relevant when the serialization method is set to json. Causes the JSON output to be wrapped in the named JSONP function.

media-type

The MIME content type of the output. It will be used to set the HTTP Content-Type header if the query is running in an HTTP context.

encoding

Specifies the character encoding to be used for outputting the instance of the data model

doctype-public and doctype-system

a doctype declaration will be output if doctype-public and/or doctype-system are set. The corresponding identifier is taken from the value of the parameter.

indent= yes | no

Indent the document to make it easier to read. Indenting adds whitespace characters to element nodes, restricted by the rules given in the XQuery serialization specification.

omit-xml-declaration= yes | no

Output an XML declaration if the parameter is set to no

expand-xincludes= yes | no

Must the serializer expand XInclude elements?

highlight-matches= both | elements | attributes | none

When querying text with the full text or n-gram extensions, the query engine tracks the exact position of all matches inside text content. The serializer can later use this information to mark those matches by wrapping them into an <exist:match> element.

Setting highlight-matches=both will enable this feature for every kind of indexable node.

process-xsl-pi= yes | no

If a document is serialized and it has an XSL processing instruction, eXist-db can try to load the referenced stylesheet and apply it to the document.

add-exist-id= element | all | none

Shows the internal node ids of an element by adding an attribute exist:id="internal-node-id". Setting add-exist-id to element will only show the node id of the top-level element, all will show the ids of all elements.

json-ignore-whitespace-text-nodes= yes | no

To discard white-space text nodes when serializing XML/XDM to JSON.

For example, to disable XInclude expansion and indent the output, use the following syntax:

declare option exist:serialize "expand-xincludes=no indent=yes";

Options

eXist-db provides a number of useful options for controlling the query environment.

Query Blocking

To prevent the server from being blocked by a badly formulated query, eXist-db watches all query threads. A blocking query can be killed if it takes longer than a specified amount of time or consumes too many memory resources on the server. There are two options to control this behaviour:

declare option exist:timeout "time-in-ms";

Specifies the maximum amount of query processing time (in milliseconds) before it is cancelled by the XQuery engine.

declare option exist:output-size-limit "size-hint";

Defines a limit for the maximum size of a document fragment created within an XQuery. The limit is just an estimation, specified in terms of the accumulated number of nodes contained in all generated fragments. This can be used to prevent users from consuming too much memory if they are allowed to pass in their own XQueries.

Overriding the default XQuery Context

eXist-db provides the following options to override the default XQuery context:

declare option exist:implicit-timezone "duration";

Specifies the implicit timezone for the XQuery context.

declare option exist:current-dateTime "dateTime";

Specifies the current dateTime for the XQuery context.

declare option exist:optimize "enable=yes|no";

Temporarily disables the query rewriting optimizer for the current query. Use for testing/debugging.

Pragmas

XQuery pragmas are a way to pass implementation-specific information to the query engine. Pragmas can be wrapped around an arbitrary XQuery expression.

eXist-db recognizes the following pragmas:

exist:timer

Provides a simple way to measure the time executing a given expression. For example:

(# exist:timer #) { //some/path/expression }

This creates a timer for the expression enclosed in curly braces and prints timing information to the trace logger. Tracing needs to be enabled in log4j.xml:

<root> <priority value="trace"/> <appender-ref ref="console"/> </root>
exist:batch-transaction

Currently for XQuery Update Extensions only. Provides a method for batching updates on the database into a single transaction, allowing a set of updates to be atomically guaranteed.

For each affected document or collection, any configured Triggers will only be called once, the prepare() method will be fired before the first update to the configured resource and the finish() method fired after the last update to the configured resource.

(# exist:batch-transaction #) {
  update value //some/path/expressionA width "valueA",
  update value //some/path/expressionB width "valueB"
}

Uses a single Transaction and Trigger events for the expressions enclosed in curly braces.

exist:force-index-use

For debugging purposes. Apply on an expression that uses indexes: comparisons, fn:matches(), etc. Will raise an error if, for any reason, the index cannot be used. This helps checking whether indexes are correctly defined or not.

(# exist:force-index-use #) {
  //group[. = "dba"]
}

Raises an error if the general comparison doesn't use a range or a QName index (XPDYxxxx).

exist:no-index

Prevents the query engine using any index. Useful if the searched value isn't very selective or if it is cheaper to traverse the previous step of a path expression than querying the index.

(# exist:no-index #) {
  //group[. = "dba"]
}
exist:optimize

This pragma is inserted automatically by the query rewriter to optimize an expression that implements the org.exist.xquery.Optimizable interface.

//((#exist:optimize#) { item[stock = 10] })

Community Specifications

In addition to providing robust support for the XPath and XQuery family of specifications, eXist-db has also implemented a number of EXPath and EXQuery community specifications and modules that extend XPath and XQuery with cross-platform functions and capabilities. (Only those built into eXist are listed below; see eXist-db's Package Repository for others.)

HTTP Client

eXist-db implements the HTTP Client Module, an EXPath specification for interacting with HTTP servers. This is in addition to eXist-db's native function libraries for performing HTTP requests.

Packaging System

eXist-db implements the Packaging System, an EXPath community specification for library and application packages. See the Package Repository documentation.

RESTXQ

eXist-db implements RESTXQ, an EXQuery community effort to develop a clean approach for the deployment of RESTful services based on XQuery code annotations. This is in addition to eXist-db's native function libraries for performing HTTP requests and responses.

ZIP

eXist-db implements the ZIP Module, an EXPath community specification and module for performing compression. This is in addition to eXist-db's native function libraries for performing compression.

Legacy Features Maintained for Backward Compatibility

The following legacy features are maintained for backwards compatibility reasons:

  • Collations: Some functions in eXist support collations, which are no longer in the specification.

  • Map notation: During the drafting of XQuery 3.0, the specification changed slightly and so did the implementation in eXist-db. To keep backwards compatibility, eXist allows the older notation in map constructors: key:=value in addition to the new one, key: value.

  • Serialization: Certain Legacy Serialization Parameters.