XML-RPC API Developer's Guide
(2Q19)
This article explains how to interface with eXist-db using the XML-RPC API. This API can be used to access eXIst-db from multiple languages and environments.
Introduction
XML-RPC (XML Remote Procedural Call) provides a simple way to access eXist-db by calling remote procedures from a wide variety of programming languages and environments, like CGI scripts, PHP, JSP and more.
For a Java server, eXist uses the XML-RPC library created by Hannes Wallnoefer which recently has moved to Apache (see: http://xml.apache.org/xmlrpc).
Perl examples use the RPC::XML package, which should be available at every CPAN mirror (see CPAN).
The following is a small example, which shows how to talk to eXist-db from Java using the Apache XML-RPC library. This example can be found in
samples/org/exist/examples/xmldb/Retrieve.java
.
public class Retrieve {
protected final static String uri =
"http://localhost:8080/exist/xmlrpc";
protected static void usage() {
System.out.println( "usage: org.exist.examples.xmlrpc.Retrieve " +
"path-to-document" );
System.exit( 0 );
}
public static void main( String args[] ) throws Exception {
if ( args.length < 1 ) {
usage();
}
XmlRpc.setEncoding("UTF-8");
XmlRpcClient xmlrpc = new XmlRpcClient( uri );
Hashtable options = new Hashtable();
options.put("indent", "yes");
options.put("encoding", "UTF-8");
options.put("expand-xincludes", "yes");
options.put("highlight-matches", "elements");
Vector params = new Vector();
params.addElement( args[0] );
params.addElement( options );
String xml = (String)
xmlrpc.execute( "getDocumentAsString", params );
System.out.println( xml );
}
}
As shown above, the
execute
method of
XmlRpcClient
expects as its parameters a method (passed as a string) to call on the server and a
Vector
of parameters to pass to this executed method. In this example, the method
getDocumentAsString
is called as the first parameter, and a
Vector params
. Various output properties can also be set through the
hashtable
argument (see the method description below). Since all parameters are passed in a
Vector
, they are necessarily Java objects.
XML-RPC messages (requests and responses sent between the server and client) are themselves XML documents. In some cases, these documents may use a character encoding which is in conflict with the encoding of the document we would like to
receive. It is therefore important to set the
transport
encoding to
UTF-8
as shown in the example above. However, conflicts may persist depending on which client library is used. To avoid such conflicts, eXist provides alternative declarations for selected methods, which expect string parameters as byte arrays. The XML-RPC
library will send them as binary data (using Base64 encoding for transport). With this approach, document encodings are preserved regardless of the character encoding used by the XML-RPC transport layer.
The XML-RPC API uses
int
to encode booleans. This is because some clients do not correctly pass boolean parameters.
Querying is easy using XML-RPC:
#!/usr/bin/perl
use RPC::XML;
use RPC::XML::Client;
$query = <<'END';
for $speech in //SPEECH[LINE &= 'tear*']
order by $speech/SPEAKER[1]
return
$speech
END
$URL = "http://guest:guest\@localhost:8080/exist/xmlrpc";
print "connecting to $URL...\n";
$client = new RPC::XML::Client $URL;
# Output options
$options = RPC::XML::struct->new(
'indent' => 'yes',
'encoding' => 'UTF-8',
'highlight-matches' => 'none');
$req = RPC::XML::request->new("query", $query, 20, 1, $options);
$response = $client->send_request($req);
if($response->is_fault) {
die "An error occurred: " . $response->string . "\n";
}
print $response->value;
You will find the source code of this example in
samples/xmlrpc/search2.pl
. It uses the simple query method, which executes the query and returns a document containing the specified number of results. The result set is
not
cached on the server.
The following example calls the
executeQuery
method, which returns a unique session id. In this case, the actual results are cached on the server and can be retrieved using the
retrieve
method.
use RPC::XML;
#!/usr/bin/perl
use RPC::XML;
use RPC::XML::Client;
# Execute an XQuery through XML-RPC. The query is passed
# to the "executeQuery" method, which returns a handle to
# the created result set. The handle can then be used to
# retrieve results.
$query = <<'END';
for $speech in //SPEECH[LINE &= 'corrupt*']
order by $speech/SPEAKER[1]
return
$speech
END
$URL = "http://guest:guest\@localhost:8080/exist/xmlrpc";
print "connecting to $URL...\n";
$client = new RPC::XML::Client $URL;
# Execute the query. The method call returns a handle
# to the created result set.
$req = RPC::XML::request->new("executeQuery",
RPC::XML::base64->new($query),
"UTF-8", {});
$resp = process($req);
$result_id = $resp->value;
# Get the number of hits in the result set
$req = RPC::XML::request->new("getHits", $result_id);
$resp = process($req);
$hits = $resp->value;
print "Found $hits hits.\n";
# Output options
$options = RPC::XML::struct->new(
'indent' => 'no',
'encoding' => 'UTF-8');
# Retrieve query results 1 to 10
for($i = 1; $i < 10 && $i < $hits; $i++) {
$req = RPC::XML::request->new("retrieve", $result_id, $i, $options);
$resp = process($req);
print $resp->value . "\n";
}
# Send the request and check for errors
sub process {
my($request) = @_;
$response = $client->send_request($request);
if($response->is_fault) {
die "An error occurred: " . $response->string . "\n";
}
return $response;
}
Available Methods
This section gives you an overview of the methods implemented by eXist-db's XML-RPC server. Only the most common methods are presented here. For a complete list see the Java interface
RpcAPI.java
.
Method signatures are presented using Java data types. Some methods like
getDocument()
and
retrieve()
accept a
struct
to specify optional output properties.
The following general fields for methods are supported:
-
indent
-
Returns indented pretty-print XML. [
yes | no
] -
encoding
-
Specifies the character encoding used for the output. If the method returns a string, only the XML declaration will be modified accordingly.
-
omit-xml-declaration
-
Add XML declaration to the head of the document. [
yes | no
] -
expand-xincludes
-
Expand XInclude elements. [
yes | no
] -
process-xsl-pi
-
Specifying
yes
XSL processing instructions in the document will be processed and the corresponding stylesheet applied to the output. [yes | no
] -
highlight-matches
-
The database will add special tags to highlight the strings in the text that have triggered a fulltext match. Set to
elements
to highlight matches in element values,attributes
for attribute values orboth
for both elements and attributes. -
stylesheet
-
Use this parameter to specify an XSL stylesheet which should be applied to the output. If the parameter contains a relative path, the stylesheet will be loaded from the database.
-
stylesheet-param.key1 ... stylesheet-param.key2
-
If a stylesheet has been specified with
stylesheet
, you can pass it parameters. Stylesheet parameters are recognized if they start with the prefixstylesheet-param.
, followed by the name of the parameter. The leadingstylesheet-param.
string will be removed before the parameter is passed to the stylesheet.
Retrieving documents
-
byte[] getDocument(String name, Hashtable parameters)
String getDocumentAsString(String name, Hashtable parameters)
Retrieves a document from the database.
-
name
-
Path of the document to be retrieved (e.g.
/db/shakespeare/plays/r_and_j.xml
). -
parameters
-
A
struct
containingkey=value
pairs for configuring the output.
-
-
Hashtable getDocumentData(String name, Hashtable parameters)
Hashtable getNextChunk(String handle, Int offset)
Hashtable getNextExtendedChunk(String handle, String offset)
These methods retrieve a document from the database, but limit the number of bytes transmitted in one chunk to avoid memory shortage on the server.
getDocumentData()
returns astruct
containing the following fields:data
,handle
,offset
,supports-long-offset
.data
contains the document's data (asbyte[]
) or the first chunk of data if the document size exceeds the predefined internal limit.handle
andoffset
can be passed togetNextChunk()
orgetNextExtendedChunk()
to retrieve the remaining data chunks.If
offset
is0
, no more chunks are available and all of the data is already contained in thedata
field. Otherwise, further chunks can be retrieved by passing the handle and the offset (as returned by the last call) togetNextChunk()
orgetNextExtendedChunk()
. Once the last chunk is read,offset
will be0
and the handle becomes invalid.supports-long-offset
, when available, tells whether the server understandsgetNextExtendedChunk()
method:getNextChunk()
andgetNextExtendedChunk()
do more or less the same, but with the difference thatgetNextExtendedChunk()
does not have the 2GB limitation inoffset
., which previous eXist servers could not handle. Use thesupports-long-offset
parameter fromgetDocumentData()
returned structure to verify this.-
name
-
Path of the document to be retrieved (e.g.
/db/shakespeare/plays/r_and_j.xml
). -
parameters
-
A
struct
containingkey=value
pairs to configure the output. -
handle
-
The handle returned by the call to
getDocumentData()
(this identifies a temporary file on the server). - offset
-
The data offset in the document at which the next chunk in the sequence will be read.
-
Storing Documents
-
boolean parse(byte[] xml, String docName, int overwrite)
boolean parse(byte[] xml, String docName)
Insert a new document into the database or replace an existing one.
-
xml
-
XML content of this document as a UTF-8 encoded byte array.
-
docName
-
Path to the database location where the new document is to be stored.
-
overwrite
-
Set this value to >
0
to automatically replaces an existing document at the same location.
-
-
String upload(byte[] chunk, int length)
String upload(String file, byte[] chunk, int length)
boolean parseLocal(String localFile, String docName, boolean replace)
Uploads an entire document on to the database before parsing it.
The
parse
method receives the document as a large single chunk, but theupload
method allows you to upload the whole document to the server before parsing. Doing it this way out-of-memory exceptions can be avoided since the document is not entirely kept in memory. To identify the file on the server, upload returns an identifier string. After uploading all chunks, you can callparseLocal
and pass it this identifier string as its first argument.-
file
-
The name of the file to which the uploaded chunk is appended. This is the name of a temporary file on the server. Use the two-argument version of upload for the first chunk. The method creates a temporary file and returns its name.
-
chunk
-
A byte array containing the data to be appended.
-
length
-
Defines the number of bytes to be read from chunk.
-
localFile
-
The name of the local file on the server that is to be stored in the database. This should be the same as the name returned by
upload
. -
docName
-
The full path specifying the location where the document should be stored in the database.
-
replace
-
Set this to
true
if an existing document with the same name should be automatically overwritten.
-
Creating a Collection
-
boolean createCollection(String name)
Creates a new collection.
-
name
-
Path to the new collection.
-
Removing Documents or Collections
-
boolean remove(String docName)
Removes a document from the database.
-
docName
-
The full path to the database document.
-
-
boolean removeCollection( String collection)
Removes a collection from the database (including all of its documents and sub-collections).
-
collection
-
The full path to the collection.
-
Querying
-
int executeQuery(String xquery, HashMap parameters)
int executeQuery(byte[] xquery, HashMap parameters)
int executeQuery(byte[] xquery, String encoding, HashMap parameters)
Executes an XQuery and returns a reference identifier to the generated result set. This reference can be used to retrieve the results.
-
xquery
-
A valid XQuery expression.
-
parameters
-
The parameters as HashMap values (
sort-expr
,namespaces
,variables
,base-uri
,static-documents
,protected
). -
encoding
-
The character encoding used for the query string.
-
-
Hashtable querySummary(int result-Id)
Returns a summary of query results for the result-set referenced by
result-Id
.The
result-Id
value is taken from a previous call toexecuteQuery
(see above). ThequerySummary
method returns astruct
with the following fields:queryTime
,hits
,documents
,doctype
.queryTime
andhits
are integer values that describe the processing time in milliseconds for the query execution and the number of hits in the result-set respectively. The fielddocuments
is an array of an array (i.e.Object[][3]
) that represents a table in which each row identifies one document. The first field in each row contains thedocument-id
(integer value). The second has the document's name as a string value. The third contains the number of hits found in this document (integer value).The
doctype
field is also an array of an array (Object[][2]) that contains thedoctype
public identifier and the number of hits found for thisdoctype
in each row.-
resultId
-
Reference to a result-set as returned by a previous call to
executeQuery
.
-
-
byte[] retrieve(int resultId, int pos, Hashtable parameters)
Retrieves a single result-fragment from the result-set referenced by
resultId
. The result-fragment is identified by its position in the result-set.-
resultId
-
Reference to a result-set as returned by a previous call to
executeQuery
. -
pos
-
The position of the item in the result-sequence, starting at 0.
-
parameters
-
A
struct
containingkey=value
pairs to configure the output.
-
-
Hashtable retrieveFirstChunk(int resultId, int pos, Hashtable parameters)
Retrieves a single result-fragment from the result-set referenced by
resultId
, but limiting the number of bytes transmitted in one chunk to avoid memory shortage on the server. The result-fragment is identified by its position in the result-set, which is passed in the parameterpos
. It returns the same structure asgetDocumentData()
, and its fields behaves the same, so next chunks must be fetched using eithergetNextChunk()
orgetNextExtendedChunk()
(seegetDocumentData()
documentation for further details).-
resultId
-
Reference to a result-set as returned by a previous call to
executeQuery
. -
pos
-
The position of the item in the result-sequence, starting at 0.
-
parameters
-
A
struct
containingkey=value
pairs to configure the output.
-
-
int getHits(int resultId)
Get the number of hits in the result-set identified by
resultId
.-
resultId
-
Reference to a result-set as returned by a previous call to
executeQuery
.
-
-
String query(byte[] xquery, int howmany, int start, Hashtable parameters)
Executes an XQuery expression and returns a specified subset of the results. This method will directly return a subset of the result-sequence, starting at
start
, as a new XML document. The number of results returned is determined by parameterhowmany
. The result-set will be deleted on the server, so later calls to this method will again execute the query.-
xquery
-
An XQuery expression.
-
start
-
The position of the first item to be retrieved from the result-sequence.
-
howmany
-
The maximum number of items to retrieve.
-
parameters
-
A struct containing
key=value
pairs to configure the output.
-
-
void releaseQueryResult(int resultId)
Forces the result-set identified by its result id to be released on the server.
Retrieving Information on Collections and Documents
-
Hashtable describeCollection(String collection)
Returns a
struct
describing a specified collection.The returned
struct
has the following fields:name
(the collection path),owner
(identifies the collection owner),group
(identifies the group that owns the collection),created
(the creation date of the collection expressed as a long value),permissions
(the active permissions that apply to the collection as an integer value).collections
is an array listing the names of available sub-collections in this collection.-
collection
-
The full path to the collection.
-
-
Hashtable describeResource(String resource)
Returns a
struct
describing a specified resource.The returned
struct
has the following fields:name
(the collection path),owner
(identifies the collection owner),group
(identifies the group that owns the collection),created
(the creation date of the collection expressed as a long value),permissions
(the active permissions that apply to the collection as an integer value),type
(eitherXMLResource
for XML documents orBinaryResource
for binary files),content-length
(the estimated size of the resource in bytes). Thecontent-length
is based on the number of pages occupied by the resource in the DOM storage. For binary resources, the value will always be0
. -
Hashtable getCollectionDesc(String collection)
Returns a struct describing a collection.
The returned struct has the following fields:
name
(the collection path),owner
(identifies the collection owner),group
(identifies the group that owns the collection),created
(the creation date of the collection expressed as a long value),permissions
(the active permissions that apply to the collection as an integer value).collections
is an array listing the names of available sub-collections in this collection.documents
is an array listing information on all of the documents in this collection. Each item in the array is a struct with the following fields: name, owner, group, permissions, type. The type field contains a string describing the type of the resource: eitherXMLResource
orBinaryResource
.-
collection
-
The full path to the collection.
-
XUpdate
-
int xupdate(String collectionName, byte[] xupdate)
int xupdateResource(String documentName, byte[] xupdate)
Applies a set of XUpdate modifications to a collection or document.
-
collectionName
-
The full path to the collection to which the XUpdate modifications should be applied.
-
documentName
-
The full path to the document to which the XUpdate modifications should be applied.
-
xupdate
-
The XUpdate document containing the modifications. This should be send as an
UTF-8
encoded binary array.
-
Managing Accounts and Permissions
-
boolean addAccount(String name, String passwd, String digestPasswd, Vector groups)
boolean updateAccount(String name, String passwd, String digestPasswd, Vector groups, String home)
Modifies or creates a user account.
-
name
-
Username .
-
passwd
-
The plain-text (!) password for the user.
-
digestPasswd
-
The md5 encoded password for the user.
-
groups
-
A vector of groups assigned to the user. The first group in the vector will become the user's primary group.
-
home
-
An optional setting for the accounts's home collection path. The collection will be created if it does not exist, and provides the user with full access.
-
-
boolean setPermissions(String resource, String permissions)
boolean setPermissions(String resource, int permissions)
boolean setPermissions(String resource, String owner, String ownerGroup, String permissions)
boolean setPermissions(String resource, String owner, String ownerGroup, int permissions)
Sets the permissions assigned to a given collection or document.
-
resource
-
The full path to the collection or document on which the specified permissions will be set. The method first checks if the specified path points to a collection or resource.
-
owner
-
The name of the user owning this resource.
-
ownerGroup
-
The name of the group owning this resource.
-
permissions
-
The permissions assigned to the resource, which can be specified either as an integer value constructed using the
Permission
class, or using a modification string. The bit encoding of the integer value corresponds to Unix conventions. The modification string has the following syntax:[user|group|other]=[+|-][read|write|update][, ...]
-
-
Hashtable getPermissions(String resource)
Returns the active permissions for the specified document or collection.
The returned
struct
has the following fields:name
(the collection path),owner
(identifies the collection owner),group
(identifies the group that owns the collection),created
(the creation date of the collection expressed as a long value),permissions
(the active permissions that apply to the collection as an integer value). -
boolean removeAccount(String name)
Removes the identified account.
-
Hashtable getAccount(String name)
Returns a struct describing the account identified by its name.
The returned struct has the following fields:
name
(the collection path),home
(identifies the user's home directory),groups
(an array specifying all groups to which the user belongs). -
Vector getAccount()
Returns a list of all accounts currently known to the system.
Each account in the list is described by the same struct returned by the
getAccount()
method. -
Vector getGroups()
Returns a list of all group names (as string values) currently defined.
Access to the Index Contents
The following methods provide access to eXist's internal index structure.
-
Vector getIndexedElements(String collectionName, boolean inclusive)
Returns a list (i.e.
array[][4]
) of all indexed element names for the specified collection.For each element, an array of four items is returned:
-
name of the element
-
optional namespace URI
-
optional namespace prefix
-
number of occurrences of this element as an integer value
-
collectionName
-
The full path to the collection.
-
inclusive
-
If set to
true
, the sub-collections of the specified collection will be included into the result.
-
-
Vector scanIndexTerms(String collectionName, String start, String end, boolean inclusive)
Return a list (
array[][2]
) of all index terms contained in the specified collection.For each term, an array with two items is returned:
-
the term itself
-
number occurrences of the term in the specified collection
-
collectionName
-
The full path to the collection.
-
start
-
The start position for the returned range expressed as a string value. Returned index terms are positioned after the start position in ascending, alphabetical order.
-
end
-
The end position for the returned range expressed as a string value. Returned index terms are positioned before the end position in ascending, alphabetical order.
-
inclusive
-
If set to
true
, sub-collections of the specified collection will be included into the result.
-
Other Methods
-
boolean shutdown()
Shuts down the database engine. All dirty pages are written to disk.
-
boolean sync()
Causes the database to write all dirty pages to disk.