Developer's Guide

REST-Style Web API

eXist-db provides a REST-style (or RESTful) API through HTTP, which provides the simplest and quickest way to access the database. To implement this API, all one needs is an HTTP client, which is provided by nearly all programming languages and environments. However, not all of the database features are available using this approach.

When running eXist-db as a stand-alone server - i.e. when the database has been started using the shell-script bin/server.sh (Unix) or batch file bin/server.bat (Windows/DOS) - HTTP access is supported through a simple, built-in web server. This web server however has limited capabilities restricted to the basic operations defined by eXist's REST API (e.g. GET, POST, PUT and , DELETE).

When running in a servlet-context, this same server functionality is provided by the EXistServlet. In the standard eXist distribution, this servlet is configured to have a listen address at:

http://localhost:8080/exist/rest/

Both the stand-alone server and the servlet rely on Java class org.exist.http.RESTServer to do the actual work.

The server treats all HTTP request paths as paths to a database collection, i.e. all resources are read from the database instead of the file system. Relative paths are therefore resolved relative to the database root collection. For example, if you enter the following URL into your web-browser:

http://localhost:8080/exist/rest/db/shakespeare/plays/hamlet.xml

the server will receive an HTTP GET request for the resource hamlet.xml in the collection /db/shakespeare/plays in the database. The server will look for this collection, and check if the resource is available, and if so, retrieve its contents and send them back to the client. If the document does not exist, an HTTP 404 (Not Found) status response will be returned.

To keep the interface simple, the basic database operations are directly mapped to HTTP request methods wherever possible. The following request methods are supported:

GET

Retrieves a representation of the resource or collection from the database. XQuery and XPath queries may also be specified using GET's optional parameters applied to the selected resource.

PUT

Uploads a resource onto the database. If required, collections are automatically created, and existing resources are overwritten.

DELETE

Removes a resource (document or collection) from the database.

POST

Submits data in the form of an XML fragment in the content of the request which specifies the action to take. The fragment can be either an XUpdate document or a query request. Query requests are used to pass complex XQuery expressions too large to be URL-encoded.

HTTP Authentication

The REST server and servlet support basic HTTP authentication, and only valid users can access the database. If no username and password are specified, the server assumes a "guest" user identity, which has limited capabilities. If the username submitted is not known, or an incorrect password is submitted, an error page (Status code 403 - Forbidden) is returned.

GET Requests

If the server receives an HTTP GET request, it first tries to locate known parameters. If no parameters are given or known, it will try to locate the collection or document specified in the URI database path, and return a representation of this resource the client. Note that when the located resource is XML, the returned content-type attribute value will be application/xml, and for binary resources application/octet-stream.

If the path resolves to a database collection, the retrieved results are returned as an XML fragment. An example fragment is shown below:

<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist"> <exist:collection name="/db/xinclude" owner="guest" group="guest" permissions="rwur-ur-u"> <exist:resource name="disclaimer.xml" owner="guest" group="guest" permissions="rwur-ur--"/> <exist:resource name="sidebar.xml" owner="guest" group="guest" permissions="rwur-ur--"/> <exist:resource name="xinclude.xml" owner="guest" group="guest" permissions="rwur-ur--"/> </exist:collection> </exist:result>
XML Results for GET Request for a Collection

If an xml-stylesheet processing instruction is found in an XML document being requested, the database will try to apply the stylesheet before returning the document. Note that in this case, any relative path in a hypertext link will be resolved relative to the location of the source document. For example, if the document hamlet.xml, which is stored in collection /db/shakespeare/plays contains the XSLT processing instruction:

<?xml-stylesheet type="application/xml" href="shakes.xsl"?>

then the database will try to load the stylesheet from /db/shakespeare/plays/shakes.xsl and apply it to the document.

Optionally, GET accepts the following request parameters, which must be URL-encoded:

_xsl=XSL Stylesheet

Applies an XSL stylesheet to the requested resource. If the _xsl parameter contains an external URI, the corresponding external resource is retrieved. Otherwise, the path is treated as relative to the database root collection and the stylesheet is loaded from the database. This option will override any XSL stylesheet processing instructions found in the source XML file.

Setting _xsl to no disables any stylesheet processing. This is useful for retrieving the unprocessed XML from documents that have a stylesheet declaration.

If your document has a valid XSL stylesheet declaration, the web browser may still decide to apply the XSL. In this case, passing _xsl=no has no visible effect, though the XSL is now rendered by the browser, not eXist.

_query=XPath/XQuery Expression

Executes a query specified by the request. The collection or resource referenced in the request path is added to the set of statically known documents for the query.

_indent=yes | no

Returns indented pretty-print XML. The default value is yes.

_encoding=Character Encoding Type

Sets the character encoding for the resultant XML. The default value is UTF-8.

_howmany=Number of Items

Specifies the number of items to return from the resultant sequence. The default value is 10.

_start=Starting Position in Sequence

Specifies the index position of the first item in the result sequence to be returned. The default value is 1.

_wrap=yes | no

Specifies whether the returned query results are to be wrapped into a surrounding <exist:result> element. The default value is yes.

_source=yes | no

Specifies whether the query should display its source code instead of being executed. The default value is no, but see the <allow-source> section in descriptor.xml to explicitely allow such a behaviour.

_cache=yes | no

If set to "yes", the query results of the current query are stored into a session on the server. A session id will be returned with the response. Subsequent requests can pass this session id via the _session parameter. If the server finds a valid session id, it will return the cached results instead of re-evaluating the query. For more info see below.

_session=session id

Specifies a session id returned by a previous query request. If the session is valid, query results will be read from the cached session.

_release=session id

Release the session identified by the session id.

EXAMPLE: The following URI will find all <SPEECH> elements in the collection /db/shakespeare with "Juliet" as the <SPEAKER>. As specified, it will return 5 items from the result sequence, starting at position 3:

http://localhost:8080/exist/rest/db/shakespeare?_query=//SPEECH[SPEAKER=%22JULIET%22]&_start=3&_howmany=5

PUT Requests

Documents can be stored or updated using an HTTP PUT request. The request URI points to the location where the document will be stored. As defined by the HTTP specifications, an existing document at the specified path will be updated, i.e. removed, before storing the new resource. As well, any collections defined in the path that do not exist will be created automatically.

For example, the following Python script stores a document (the name of which is specified on the command-line) in the database collection /db/test, which will be created if this collection does not exist. Note that the HTTP header field content-type is specified as application/xml, since otherwise the document is stored as a binary resource.

import httplib import sys from string import rfind collection = sys.argv[1] file = sys.argv[2] f = open(file, 'r') print "reading file %s ..." % file xml = f.read() f.close() p = rfind(file, '/') if p > -1: doc = file[p+1:] else: doc = file print doc print "storing document to collection %s ..." % collection con = httplib.HTTP('localhost:8080') con.putrequest('PUT', '/exist/rest/%s/%s' % (collection, doc)) con.putheader('Content-Type', 'application/xml') clen = len(xml) con.putheader('Content-Length', `clen`) con.endheaders() con.send(xml) errcode, errmsg, headers = con.getreply() if errcode != 200: f = con.getfile() print 'An error occurred: %s' % errmsg f.close() else: print "Ok."
PUT Example using Python (See: samples/http/put.py)

DELETE Requests

DELETE removes a collection or resource from the database. For this, the server first checks if the request path points to an existing database collection or resource, and once found, removes it.

POST Requests

POST requests require an XML fragment in the content of the request, which specifies the action to take.

If the root node of the fragment uses the XUpdate namespace (http://www.xmldb.org/xupdate), the fragment is sent to the XUpdateProcessor to be processed. Otherwise, the root node will have the namespace for eXist requests (http://exist.sourceforge.net/NS/exist), in which case the fragment is interpreted as an extended query request. Extended query requests can be used to post complex XQuery scripts that are too large to be encoded in a GET request.

The structure of the POST XML request is as follows:

<query xmlns="http://exist.sourceforge.net/NS/exist" start="[first item to be returned]" max="[maximum number of items to be returned]" cache="[yes|no: create a session and cache results]" session-id="[session id as returned by previous request]"> <text>[XQuery expression]</text> <properties> <property name="[name1]" value="[value1]"/> </properties> </query>
Extended Query Request

The root element query identifies the fragment as an extended query request, and the XQuery expression for this request is enclosed in the text element. The start, max, cache and session-id attributes have the same meaning as the corresponding GET parameters. Optional output properties, such as pretty-print, may be passed in the properties element. An example of POST for Perl is provided below:

require LWP::UserAgent; $URL = 'http://localhost:8080/exist/rest/db/'; $QUERY = <<END; <?xml version="1.0" encoding="UTF-8"?> <query xmlns="http://exist.sourceforge.net/NS/exist" start="1" max="20"> <text> for \$speech in //SPEECH[LINE &= 'corrupt*'] order by \$speech/SPEAKER[1] return <hit>{\$speech}</hit> </text> <properties> <property name="indent" value="yes"/> </properties> </query> END $ua = LWP::UserAgent->new(); $req = HTTP::Request->new(POST => $URL); $req->content_type('application/xml'); $req->content($QUERY); $res = $ua->request($req); if($res->is_success) { print $res->content . "\n"; } else { print "Error:\n\n" . $res->status_line . "\n"; }
POST Example using Perl (See: samples/http/search.pl)

Please note that you may have to enclose the XQuery expression in a CDATA section (i.e. <![CDATA[ ... ]]>) to avoid parsing errors (this is not shown above).

The returned query results are enclosed in the <exist:result> element, which are shown below for the above example:

<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist" hits="2628" start="1" count="10"> <SPEECH xmlns:exist="http://exist.sourceforge.net/NS/exist"> <SPEAKER>BERNARDO</SPEAKER> <LINE>Who's there?</LINE> </SPEECH> ... more items follow ... </exist:result>
Returned Results for POST Request

Calling Stored XQueries

The REST interface supports stored XQueries on the server: if the target resource of a GET or POST request is a binary resource with the mime-type application/xquery, the REST server will try to compile and execute it as an XQuery. The XQuery has access to the entire HTTP context, including parameters and session attributes.

Stored XQueries are a good way to provide dynamic views on the data or create small services. However, they can do more: since you can also store binary resources like images, CSS stylesheets or Javascript files into a database collection, it is easily possible to serve a complex application entirely out of the database.

Please have a look at the example Using XQuery for Web Applications on the demo server.

Cached Query Results

When executing queries using GET or POST, the server can cache query results in a server-side session. The results are cached in memory. In general, memory consumption will be low for query results which reference nodes stored in the database. It is high for nodes constructed within the XQuery itself.

To create a session and store query results into it, pass parameter _cache=yes with a GET request or set attribute cache="yes" within the XML payload of a POST query request. The server will execute the query as usual. If the result sequence contains more than one item, the entire sequence will be stored into a newly created session.

The id of the created session is included with the response. For requests which return a <exist:result> wrapper element, the session id will be specified in the exist:session attribute. The session id is also available in the HTTP header X-Session-Id. The following example shows the header and <exist:result> tag returned by the server:

HTTP/1.1 200 OK
Date: Thu, 01 May 2008 16:28:16 GMT
Server: Jetty/5.1.12 (Linux/2.6.22-14-generic i386 java/1.6.0_03
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Last-Modified: Tue, 29 Apr 2008 20:34:33 GMT
X-Session-Id: 2
Content-Type: application/xml; charset=UTF-8
Content-Length: 4699

<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist" 
    exist:hits="3406" exist:start="1" exist:count="10" 
    exist:session="2">
...
</exist:result>
Sample Response

The session id can then be passed with subsequent requests to retrieve further chunks of data without re-evaluating the query. For a GET request, pass the session id with parameter _session. For a POST request, add an attribute session="sessionId" to the XML content of the request.

If the session does not exist or has timed out, the server will simply re-evaluate the query. The timeout is set to 2 minutes.

A session can be deleted by sending a GET request to an arbitrary collection url. The session id is passed in a parameter _release:

http://localhost:8080/exist/rest/db?_release=0