Writing Java Applications with the XML:DB API
This article explains how to work with eXist-db from Java code using the XML:DB API. This API provides a common interface to native or XML-enabled databases and supports the development of portable, reusable applications.
The preferred way to work with eXist-db when developing Java applications is to use the XML:DB API. eXist-db's implementation of the XML:DB standards follows the XIndice implementation, and conforms to the latest working drafts put forth by the XML:DB Initiative. For more information, refer to the Javadocs for this API.
The basic components employed by the XML:DB API are drivers, collections, resources and services.
Drivers are implementations of the database interface that encapsulate the database access logic for specific XML database products. They are provided by the product vendor and must be registered with the database manager.
A collection is a hierarchical container for resources and further sub-collections. Currently two
different resources are defined by the API:
XMLResource represents an XML document or a document fragment, selected by a previously executed XPath query.
Finally, services are requested for special tasks such as querying a collection with XPath, or managing a collection.
There are several XML:DB examples provided in eXist's
samples directory . To start an example, use the
start.jar jar file and pass the name of the example class as the first parameter, for instance:
java -jar start.jar org.exist.examples.xmldb.Retrieve [- other options]
Programming with the XML:DB API is straightforward. You will find some code examples in the
In the following simple example, a document is retrieved from the eXist server and printed to standard output.
With this example, the database driver class for eXist (
org.exist.xmldb.DatabaseImpl) is registered first with the
Next we obtain a
Collection object from the database manager by calling the static method
DatabaseManger.getCollection(). This method expects a fully qualified URI for its parameter value, which identifies the
desired collection. The format of this URI must be:
Because more than one database driver can be registered with the database manager, the first part of the URI
xmldb:exist) is required to determine which driver class to use. The database-id is used by the
database manager to select the correct driver from its list of available drivers. To use eXist-db, this ID should always be
(unless you have set up multiple database instances; additional instances may have other names).
The final part of the URI identifies the collection path, and optionally the host address of the database server on the network. Internally,
eXist uses two different driver implementations: The first talks to a remote database engine using XML-RPC calls, the second has direct access
to a local instance of eXist-db. The root collection is always identified by
/db. For example:
This references the Shakespeare collection on a remote server running the XML-RPC interface as a servlet at
If we leave out the host address, the XML:DB driver will try to connect to a locally attached database instance. For instance:
In this case, we have to tell the XML:DB driver that it should create a new database instance if none has been started. This is done by
create-database property of class
true (more information on embedded
use of eXist-db can be found in the deployment guide).
setProperty calls are used to set database-specific parameters. In this case, pretty-printing of XML output is turned
on for the collection. eXist uses the property keys defined in the standard Java package
javax.xml.transform. Thus, in Java
you can simply use class
OutputKeys to get the correct keys.
col.getResource() finally retrieves the document, which is returned as an
resources have a method
getContent(), which returns the resource's content, depending on it's type. In this case we retrieve
the content as type
To query the repository, we may either use the standard
XPathQueryService or eXist's
class. The XML:DB API defines different kinds of services, which may or may not be provided by the database. The
method of class
Collection calls a service if it is available. The method expects the service name as the first parameter,
and its version (as a string) as the second.
The following is an example of using the XML:DB API to execute a database query:
To execute the query, method
service.query(xpath) is called. This method returns a
containing the Resources found by the query.
ResourceSet.getIterator() gives us an iterator over these resources. Every
Resource contains a single document fragment or value, selected by the XPath expression.
Internally, eXist does not distinguish between XPath and XQuery expressions.
XQueryService thus maps to the same
implementation class as
XPathQueryService. However, it provides a few additional methods. Most important, when talking to an
XQueryService allows for the XQuery expression to be compiled to an internal
representation, which can then be reused. With compilation, the previous example code would look as follows:
The XML-RPC server automatically caches compiled expressions, and so calling
compile through the remote driver produces
no effect if the expression is already cached.
Next, we would like to store a new document into the repository. This is done by creating a new
XMLResource, assigning it
the content of the new document, and calling the
storeResource method of class
First, a new Resource is created by method
Collection.createResource(), and expects two parameters: the id and type of
resource being created. If the id parameter is
null, a unique resource-id will be automatically generated .
In some cases, the collection may not yet exist, and so we must create it. To create a new collection, call the
createCollection method of the
CollectionManagementService service. In the following example, we simply
start at the root collection object to get the
XMLResource.setContent() method takes a Java object as its parameter. The eXist driver checks if the object is a
File. Otherwise, the object is transformed into a String by calling the object's
toString() method. Passing a File has one
big advantage: If the database is running in embedded mode, the file will be directly passed to the indexer. Thus, the file's content does not
have to be loaded into memory. This is handy if your files are very large.
eXist provides extensions on top of the XML:DB specification.
UserManagementService service contains methods to manage users and handle permissions. These methods resemble common Unix
commands such as
chmod. As with other services,
UserManagementService can be
retrieved from a collection objectn:
UserManagementService service = (UserManagementService)collection.getService("UserManagementService", "1.0");
Another service called
DatabaseInstanceManager provides a single method to shut down the database instance accessed by the
driver. You have to be a member of the
dba user group to use this method or an exception will be thrown.
IndexQueryService supports access to the terms and elements contained in eXist's internal index. Method
getIndexedElements() returns a list of element occurrences for the current collection. For each occurring element, the
element's name and a frequency count is returned.
scanIndexTerms() allows for a retrieval of the list of occurring words for the current collection. This can be useful
to provide users a list of searchable terms together with their frequency.
As explained above, passing a local XML:DB URI to the
DatabaseManager means that the driver will try to start or access
an embedded database instance. You can configure more than one database instance by setting the location of the central configuration
The configuration file is set through the
configuration property of the
DatabaseImpl driver class.
If you would like to use different drivers for different database instances, specify a name for the created instance through the
database-id property. You may later use this name in the URI to refer to a database instance. The following fragment sets
up two instances:
With this example, the URI
xmldb:test:///db selects the test database instance. Both instances should have their own data and
log directory as specified in the configuration files.