The xmldb module

(2Q19)


The xmldb module (http://exist-db.org/xquery/xmldb function namespace) contains functions for manipulating database contents. The full list of functions and their documentation can be found in the Function Documentation Library. This article handles some of the highlights and main uses for this module.

Manipulating Database Contents

The xmldb functions can be used to create new database collections or documents.

To illustrate this, suppose we have a large file containing several RDF metadata records, but, since our application expects each record to have its own document, we do not want to store the metadata records in a single file. SO we have to divide the document into smaller units. This can be done by the following XQuery:

xquery version "3.0";

declare namespace rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";

import module namespace xmldb="http://exist-db.org/xquery/xmldb";

let $log-in := xmldb:login("/db", "admin", "")
let $create-collection := xmldb:create-collection("/db", "output")
for $record in doc('/db/records.rdf')/rdf:RDF/*
let $split-record := 
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
        {$record}
    </rdf:RDF>
let $about := $record/@rdf:about
let $filename := util:hash($record/@rdf:about/string(), "md5") || ".xml"
return
    xmldb:store("/db/output", $filename, $split-record)

Let's look at this example in some detail:

  • First, since we are using functions xmldb:create-collection() and xmldb:store(), which require the user to be logged in as a member of the dba group, we must log in using xmldb:login().

  • Once logged in, we can create a new sub-collection output using xmldb:create-collection().

  • The for-loop iterates over all child elements of the top RDF element.

  • In each iteration, we use xmldb:store() to write the current child node to a new document.

  • Since a unique document name is required, we need a way to generate unique names. The URI contained in the rdf:about attribute is unique, so we compute an MD5 key from it, append .xml, and use this as the document's name.

Specifying the Input Document Set

A database can contain a virtually unlimited set of collections and documents. Four functions are available to restrict the input document set to a user-defined set of documents or collections: doc(), collection(), xmldb:document(), and xmldb:xcollection(). The first two are standard XPath functions, the others eXist-db specific extensions.

The differences between the XPath and the eXist-db specific functions are:

doc() vs. xmldb:document()

While doc() is restricted to a single document-URI argument, xmldb:document() accepts multiple document paths to be included into the input node set.

Calling xmldb:document() without an argument includes every document in the database.

Some examples:

doc("/db/apps/demo/data/hamlet.xml")//SPEAKER
xmldb:document('/db/test/abc.xml', '/db/test/def.xml')//title
collection() vs. xmldb:xcollection()

The collection() function specifies the collection of documents to be included in the query evaluation. By default, documents found in sub-collections of the specified collection are also included.

For example, suppose we have a collection /db/test that contains two sub-collections /db/test/abc and /db/test/def. In this case, the function call collection('/db/test') will include all of the resources found in /db/test, /db/test/abc and /db/test/def.

The function xmldb:xcollection() does not include sub-collections.

Without an URI scheme in front (like file: or http:), eXist-db interprets the arguments to collection() and doc() as absolute or relative paths, leading to some collection or document within the database. For example:

  • doc("/db/collection1/collection2/resource.xml")

    This refers to a resource called resource.xml stored in /db/collection1/collection2.

  • doc("resource.xml")

    This references a resource relative to the base URI property defined in the static XQuery context. This contains an XML:DB URI pointing to the base collection (see below) for the current query context, for instance xmldb:exist:///db.

The base collection depends on how the query context was initialized. If you call a query via the XML:DB API, the base collection is the collection from which the query service was obtained. All relative URLs will be resolved relative to that collection. If a stored query is executed via REST, the base collection is the collection in which the XQuery source resides. In most other cases, the base collection will point to the database root /db.

As it might not always be clear what the base collection is, we recommend to always use absolute paths. This allows using the query with different interfaces.

You can also pass a full URI to the doc() function:

doc("http://localhost:8080/exist/servlet/db/test.xml")

The data on URI will be retrieved and stored in a temporary document in the database.