KWIC (Keywords in Context) Output
Keywords In Context (KWIC) helps users to quickly scan through search results by listing hits surrounded by their context. eXist provides a KWIC module that is not bound to a specific index or query operation. It but can be applied to query results from all indexes that support match highlighting. This includes the Lucene-based index and the ngram index.
The documentation search function on eXist's home page is a good example. It queries documents written in DocBook format. However, the KWIC module has also been successfully used with different schemas (e.g. TEI) and languages (e.g. Chinese).
The KWIC module is entirely written in XQuery. To use the module, import its namespace into your query (you don't need to specify a location):
The easiest way to get KWIC output is to call the
kwic:summarize function on an element node returned from a full text or
Every call to
kwic:summarize will return an HTML paragraph containing 3
span elements with the text before
and after each match, as well as the match text itself:
<config> element, passed to
kwic:summarize (as second parameter) determines the appearance of the generated
HTML. It recognizes 3 attributes:
The maximum number of characters to be printed before and after the match
kwic:summarizereturns an HTML paragraph with spans.
table="yes"it will return an HTML table row
<tr>element. The text chunks will be enclosed in a table column
If present, each match will be enclosed within a link, using the URI in the link attribute as target.
If you look at the output of query above you may notice that a space is missing between words if the previous or following chunk extends
to a different
<LINE> element. And it would also be nicer to display text from
<LINE> elements only and to ignore
<STAGEDIR> elements. This can be achieved with the help of a callback function:
The third parameter to
kwic:summarize here is a reference to a function accepting 2 arguments:
A single text node which should be appended or prepended to the current text chunk
A string indicating the current direction in which text is appended:
The function can return the empty sequence if the current node should be ignored (for instance if it belongs to a footnote which should not be displayed). Otherwise it must return a single string.
local:filter function above first checks if the passed node has a SPEAKER or STAGEDIR parent. If so, it
ignores that node by returning the empty sequence. If not, the function adds a single whitespace before or after the
string, so adjacent lines will be properly separated.
kwic:summarize, you will get one KWIC-formatted item for every match, even if the matches are in the same
paragraph. Also, the context from which the text is taken is always the same: the element you queried. To get more control over the output, you
can directly call
kwic:get-summary, which is the module's core function.
kwic:get-summary expects 3 or 4 parameters.
The current context root
The match object to process
Parameters 3 and 4 are the same as for
Before passing nodes to
kwic:get-summary you have to expand them, which basically means to create an
in-memory copy in which all matches are properly marked up with
<exist:match> tags. The main part of the query should look as
In this example, we select the first
<exist:match> only, thus ignoring all other matches within
Sometimes you may also want to change the context to restrict the KWIC display to certain elements within the larger query context, for
instance paragraphs within sections. The following example still queries
<SPEECH> but displays a KWIC entry for each
with a match, grouped by speech:
You might wonder why we don't query
<LINE> directly to get a different context, as in:
This is because Lucene computes the relevance of each match with respect to the SPEECH context, not LINE. If we queried LINE, each single line would get a match score and the matches would end up in a completely different order.
Sometimes you don't want to use the KWIC module, but still would like an indication where matches were found in the text. eXist's XML
serializer can automatically highlight matches when it writes out a piece of XML. All the matches will be surrounded by an
You can achieve the same within an XQuery by calling the extension function
util:expand returns a copy of the XML fragment it received in its first parameter, which, unless configured otherwise, has
all matches wrapped into