Keyword Search Tutorial Featuring KWIC Display

Joe Wicentowski has added another excellent tutorial to the XQuery wikibook. It shows how to develop a keyword search on multiple document types and how to display search results with highlighted keywords in context (KWIC).

In particular the KWIC display should be interesting. It is based on XQuery code originally developed for the documentation search facility on the main eXist page and the wiki (see "Quick Search" box to the right). Contrary to earlier solutions, we no longer need complicated callback functions to extract the matches with surrounding text. Instead, all the processing is done in XQuery. This became possible thanks to recent improvements in the query engine (that's why you need an eXist version build from SVN).

Joe's tutorial guides you through a complete example. The XQuery wikibook is a great resource for XQuery in general. Don't miss it.

Introducing XQueryURLRewrite

There have been some major changes to the general setup of eXist's web application in the current SVN trunk. Access to the documentation and most of the XQuery examples is no longer going through Cocoon. The Cocoon pipelines have been replaced by a single controller XQuery, which handles all the request routing and output transformations. This doesn't mean we are dropping support for Cocoon. We just think that separating the Cocoon-dependant features from the eXist-only stuff will help new users to get an easier start with eXist. Advanced users can always check out the eXist-as-Cocoon-block distribution to have all features available.

Read article ...

New Full Text Index is Based on Lucene

The current SVN trunk version of eXist features a new full text indexing module which could be the foundation for a faster, better configurable and feature rich alternative to eXist's builtin full text index. The new search facility in AtomicWiki (check the Quick Search box to the right) is based on it, so you can immediately see the index in action by executing a search here.

Read article ...

Eclipse Plugin for eXist

Eclipse is one of the most popular development platforms. Because of that we were looking for a way to access the eXist database directly from Eclipse in a convenient way. The result of our thoughts is the eXist Eclipse Plugin.

Read article ...

eXist 1.2.4 Released

Besides fixing critical bugs in the storage backend, the 1.2.4 release mainly improves the memory consumption of queries on large document sets. Major changes include:

  • new node set implementation, which is much more memory efficient compared to previous approaches. The old implementation consumed a lot of memory when used with larger sets of documents. Obviously this had a negative effect on overall performance.
  • reduce memory consumption of documents constructed during a query: if you have a query which creates thousands of small XML fragments, each of those fragments used to have its own document context with its own name pool and various fields which may have never been needed. Large parts of the document context are now shared between fragments and we make more use of lazy initialization, thus reducing the memory consumption of in-memory fragments dramatically (in my tests, I could save up to 100mb memory when creating a few thousand XML fragments in one query).
  • fixed fatal btree bugs leading to index corruptions (which usually caused an ArrayIndexOutOfBounds exception). The bugs were more likely to occur when indexing large string keys, but they may also have happened in other situations. The failure damaged the index and rendered the db unusable (though it could be repaired).
  • fixed concurrency issues leading to ArrayIndexOutOfBounds or NoSuchElement exception when querying for attributes
  • memory leak: we observed that the xerces XML parser builds some internal data structures when validating a document, which are unfortunately not properly cleared afterwards. This is a major problem since eXist pools the XML parser instances. To work around those issues, eXist will no longer pool XML parsers which were used on larger documents.
  • using full text and ngram indexes at the same time caused eXist to hang in an endless loop

The release is now available for download.

Note: all releases in the 1.2 branch are bug fix releases and can be considered stable. They only contain hand-selected changes which were ported back from the main development version.