Akismet in XQuery

So after receiving lots of comment Spam on my personal blog, I switched from using reCaptcha to Asirra, both small Modules which I implemented in XQuery.

I had assumed that the Spam was the result of a Robot, that was brute force cracking the reCaptcha Captchas via image transformation and OCR. As such, I envisaged that moving from reCaptcha to Asirra would solve this issue, as Asirra is much much tougher for a Robot to solve.

Unfortunately the move from reCaptcha to Asirra did not completely stop the spam, although the quantity is now much less. From this I am concluding that the Spammers are actually Human and that because Asirra is more time consuming that reCaptcha, this has just slowed them down.

Now, I am well versed in email Spam Filtering, as in the past I have configured plenty of Postfix mail servers with SpamAssasin and various DNS Black/White Lists. The thought occurred to me that there must be a similar service for blog comments, a quick Google revealed both Akismet and TypePad AntiSpam.

Akismet appears to be the more established player, however their terms of use are quite limiting, for example whilst personal use is free, you have to pay for commercial use. On the other hand TypePad AntiSpam are the young upstart and have very liberal terms of use. The good news is that TypePad AntiSpam implements exactly the same API as Akismet, so by just changing the hostname of the server you are contacting, you can choose to use either Akismet or TypePad AntiSpam.

So I decided to implement TypePad AntiSpam filtering of comments submitted to my blog, and guess what? I implemented it as a reusable XQuery Module (downloadable from here), which makes use of the EXPath HttpClient functions, so whilst this will work on eXist-db, it should also be useable on any XQuery processor that supports EXPath.

Example (X)HTML Page (example.html)

<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Asirra Example</title> </head> <body> <form action="example.xql" method="post" id="commentform"> <fieldset> <label for="comment_name">Name</label> <br/> <input id="comment_name" name="name" type="text" size="40"/> <br/> <label for="comment_email">email address</label> (will not be shown)<br/> <input id="comment_email" name="email" type="text" size="40"/> <br/> <label for="comment_website">Website</label> <br/> <input id="comment_website" name="website" type="text" size="60"/> <br/> <label for="comment_comments">Comments</label> <br/> <textarea id="comment_comments" name="comments" rows="12" cols="55"> </textarea> </fieldset> <input type="submit"/> </form> </body> </html>

Example XQuery handler (example.xql)

xquery version "1.0"; import module namespace request = "http://exist-db.org/xquery/request"; import module namespace akismet = "http://akismet.com/xquery/api" at "xmldb:exist:///db/akismet.xqm"; declare variable $local:akismet-api-key := "your-akismet-or-typepad-api-key-goes-here"; declare function local:is-comment-spam() as xs:boolean { akismet:comment-check( $local:akismet-api-key, <akismet:comment> <akismet:blog>http://www.adamretter.org.uk/blog.xql</akismet:blog> <akismet:user_ip>{request:get-header("X-Real-IP")}</akismet:user_ip> <akismet:user_agent>{request:get-header("User-Agent")}</akismet:user_agent> <akismet:referrer>{request:get-header("Referer")}</akismet:referrer> <akismet:permalink>http://www.adamretter.org.uk/{request:get-parameter("comment",())}</akismet:permalink> <akismet:comment_type>comment</akismet:comment_type> <akismet:comment_author>{request:get-parameter("name", ())}</akismet:comment_author> { if(request:get-parameter("email",()))then <akismet:comment_author_email>{request:get-parameter("email", ())}</akismet:comment_author_email> else(), if(request:get-parameter("website",()))then <akismet:comment_author_url>{ request:get-parameter("website", ()) }</akismet:comment_author_url> else() } <akismet:comment_content>{request:get-parameter("comments", ())}</akismet:comment_content> </akismet:comment> ) }; if(local:is-comment-spam())then <result> <it-was-spam/> </result> else <result> <not-spam/> </result>

Akismet XQuery Module (akismet.xqm)

xquery version "1.0"; (:~ : XQuery Module implementation for the Akismet API - http://akismet.com/development/api/ : : Can be used with either Akismet or the TypePad AntiSpam service : : @author Adam Retter <adam@exist-db.org> : @date 2011-06-24T21:26:00+02:00 :) module namespace akismet = "http://akismet.com/xquery/api"; import module namespace http = "http://expath.org/ns/http-client"; declare variable $akismet:HTTP-OK := 200; declare variable $akismet:endpoint := "api.antispam.typepad.com"; (: for TypePad :) (: declare variable $akismet:endpoint := "rest.akismet.com"; :) (: for Akismet :) declare variable $akismet:comment-check-service := "1.1/comment-check"; declare variable $akismet:submit-spam-service := "1.1/submit-spam"; declare variable $akismet:submit-ham-service := "1.1/submit-ham"; (:~ : Calls the Akismet comment check service : : @param api-key Your Akismet API key : @param comment : <comment xmlns="http://akismet.com/xquery/api"> : <blog> The front page or home URL of the instance making the request. For a blog or wiki this would be the front page. Note: Must be a full URI, including http://. </blog> (required) : <user_ip> IP address of the comment submitter. </user_ip> (required) : <user_agent> User agent string of the web browser submitting the comment - typically the HTTP_USER_AGENT cgi variable. Not to be confused with the user agent of your Akismet library. </user_agent> (required) : <referrer> The content of the HTTP_REFERER header should be sent here. </referrer> (note spelling) : <permalink> The permanent location of the entry the comment was submitted to. </permalink> : <comment_type> May be blank, comment, trackback, pingback, or a made up value like "registration". </comment_type> : <comment_author> Name submitted with the comment </comment_author> : <comment_author_email> Email address submitted with the comment </comment_author_email> : <comment_author_url> URL submitted with comment </comment_author_url> : <comment_content> The content that was submitted. </comment_content> : </comment> : : @return true() or false() indicating if the comment is spam or not :) declare function akismet:comment-check($api-key as xs:string, $comment as element(akismet:comment)) as xs:boolean? { let $http-request := <http:request href="{akismet:_get-service-uri($api-key, $akismet:comment-check-service)}" method="post" http="1.0" override-media-type="text/plain"> <http:header name="User-Agent" value="eXist-db/1.5 | Hermes/0.2"/> <http:body media-type="application/x-www-form-urlencoded">{ akismet:_params-xml-to-form-urlencoded($comment)}</http:body> </http:request> return let $http-result := http:send-request($http-request) return if(xs:integer($http-result[1]/http:response/@status) eq $akismet:HTTP-OK)then let $akismet-result := $http-result[2] return $akismet-result eq "true" else fn:error(xs:QName("akismet:error"), fn:concat("Akismet service responded with http code: ", $http-result/http:response/@status)) }; (:~ : Calls the Akismet submit spam service : : @param api-key Your Akismet API key : @param spam-comment : <comment xmlns="http://akismet.com/xquery/api"> : <blog> The front page or home URL of the instance making the request. For a blog or wiki this would be the front page. Note: Must be a full URI, including http://. </blog> (required) : <user_ip> IP address of the comment submitter. </user_ip> (required) : <user_agent> User agent string of the web browser submitting the comment - typically the HTTP_USER_AGENT cgi variable. Not to be confused with the user agent of your Akismet library. </user_agent> (required) : <referrer> The content of the HTTP_REFERER header should be sent here. </referrer> (note spelling) : <permalink> The permanent location of the entry the comment was submitted to. </permalink> : <comment_type> May be blank, comment, trackback, pingback, or a made up value like "registration". </comment_type> : <comment_author> Name submitted with the comment </comment_author> : <comment_author_email> Email address submitted with the comment </comment_author_email> : <comment_author_url> URL submitted with comment </comment_author_url> : <comment_content> The content that was submitted. </comment_content> : </comment> : : @return true() or false() indicating if the spam was submitted or not :) declare function akismet:submit-spam($api-key as xs:string, $spam-comment as element(akismet:comment)) as xs:boolean { let $http-request := <http:request href="{akismet:_get-service-uri($api-key, $akismet:submit-spam-service)}" method="post" http="1.0" override-media-type="text/plain"> <http:header name="User-Agent" value="eXist-db/1.5 | Hermes/0.2"/> <http:body media-type="application/x-www-form-urlencoded">{ akismet:_params-xml-to-form-urlencoded($spam-comment)}</http:body> </http:request> return let $http-result := http:send-request($http-request) return $http-result[1]/http:response/@status eq $akismet:HTTP-OK }; (:~ : Calls the Akismet submit ham service : : @param api-key Your Akismet API key : @param spam-comment : <comment xmlns="http://akismet.com/xquery/api"> : <blog> The front page or home URL of the instance making the request. For a blog or wiki this would be the front page. Note: Must be a full URI, including http://. </blog> (required) : <user_ip> IP address of the comment submitter. </user_ip> (required) : <user_agent> User agent string of the web browser submitting the comment - typically the HTTP_USER_AGENT cgi variable. Not to be confused with the user agent of your Akismet library. </user_agent> (required) : <referrer> The content of the HTTP_REFERER header should be sent here. </referrer> (note spelling) : <permalink> The permanent location of the entry the comment was submitted to. </permalink> : <comment_type> May be blank, comment, trackback, pingback, or a made up value like "registration". </comment_type> : <comment_author> Name submitted with the comment </comment_author> : <comment_author_email> Email address submitted with the comment </comment_author_email> : <comment_author_url> URL submitted with comment </comment_author_url> : <comment_content> The content that was submitted. </comment_content> : </comment> : : @return true() or false() indicating if the spam was submitted or not :) declare function akismet:submit-spam($api-key as xs:string, $ham-comment as element(akismet:comment)) as xs:boolean { let $http-request := <http:request href="{akismet:_get-service-uri($api-key, $akismet:submit-spam-service)}" method="post" http="1.0" override-media-type="text/plain"> <http:header name="User-Agent" value="eXist-db/1.5 | Hermes/0.2"/> <http:body media-type="application/x-www-form-urlencoded">{ akismet:_params-xml-to-form-urlencoded($ham-comment)}</http:body> </http:request> return let $http-result := http:send-request($http-request) return $http-result[1]/http:response/@status eq $akismet:HTTP-OK }; declare function akismet:_get-service-uri($api-key as xs:string, $service as xs:string) as xs:string { fn:concat("http://", $api-key, ".", $akismet:endpoint, "/", $service) }; declare function akismet:_params-xml-to-form-urlencoded($params as element()) as xs:string { fn:string-join( for $param in $params/child::element() return fn:concat(fn:local-name($param), "=", fn:encode-for-uri($param/text())) , "&amp;" ) };

And so far so good, since switching reCaptcha for Asirra and adding TypePad AntiSpam filtering, I havent received any spam comments. But, now that I have written this...

Asirra in XQuery

Previously I wrote an XQuery Module for handling reCaptcha Captchas, as I wanted to protect my personal blog from being spammed.

Unfortunately in the long term reCaptcha did not really work out, as the Spammers were still posting to the comments section of my blog. Its a shame really as I agree with reCaptcha's efforts of digitising books.

I have read several articles about reCaptcha Captchas being cracked, so I decided to try and find a more robot proof approach. After a little Googling, I found Asirra.

Asirra, is another Captcha system, but rather than asking you to compute a sum or enter the words that appear in a deformed image, they instead show you 12 pictures, some of Cats and some of Dogs. You have to correctly select all the Cats. This seems to me like a harder problem to solve with a robot, and so I decided to replace my reCaptcha with Asirra.

I wrote a small reusable XQuery module (downloadable from here), which makes use of the EXPath HttpClient functions, so whilst this will work on eXist-db, it should also be useable on any XQuery processor that supports EXPath.

Example (X)HTML Page (example.html)

<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Asirra Example</title> </head> <body> <form action="example.xql" method="post" id="commentform" onsubmit="return MySubmitForm();"> <!-- start Client API Asirra code --> <div id="asirra_auth"> <a id="asirra_logo" href="http://research.microsoft.com/en-us/um/redmond/projects/asirra/"> <img src="http://research.microsoft.com/en-us/um/redmond/projects/asirra/AsirraLogoWithName-Medium.png"/> </a> <script type="text/javascript" src="http://challenge.asirra.com/js/AsirraClientSide.js"/> <script type="text/javascript"> <![CDATA[ // You can control where the big version of the photos appear by // changing this to top, bottom, left, or right asirraState.SetEnlargedPosition("top"); // You can control the aspect ratio of the box by changing this constant asirraState.SetCellsPerRow(6); ]]> <script> <script type="text/javascript"> <![CDATA[ var passThroughFormSubmit = false; function MySubmitForm() { if(passThroughFormSubmit) { return true; } // Do site-specific form validation here, then... Asirra_CheckIfHuman(HumanCheckComplete); return false; } function HumanCheckComplete(isHuman) { if(!isHuman) { alert("Please correctly identify the cats."); } else { passThroughFormSubmit = true; formElt = document.getElementById("commentform"); formElt.submit(); } } ]]> </script> </div> <!-- end Client API Asirra code --> <input type="submit"/> </form> </body> </html>

Example XQuery handler (example.xql)

xquery version "1.0"; import module namespace request = "http://exist-db.org/xquery/request"; import module namespace asirra = "http://asirra.com/xquery/api" at "xmldb:exist:///db/asirra.xqm"; asirra:validate-ticket(request:get-parameter("Asirra_Ticket",()))

Asirra XQuery Module (asirra.xqm)

xquery version "1.0"; (:~ : XQuery Module implementation for the Asirra API - http://research.microsoft.com/en-us/um/redmond/projects/asirra/ : : @author Adam Retter <adam@exist-db.org> : @date 2011-06-24T21:26:00+02:00 :) module namespace asirra = "http://asirra.com/xquery/api"; import module namespace http = "http://expath.org/ns/http-client"; declare variable $asirra:HTTP-OK := 200; declare variable $asirra:validation-endpoint := "http://challenge.asirra.com/cgi/Asirra?action=ValidateTicket&amp;ticket="; (:~ : Validate an Asirra Ticket : : @param $asirra-ticket The Asirra ticket to validate : : @return true() or false() indicating whether the ticket was valid :) declare function asirra:validate-ticket($asirra-ticket as xs:string) as xs:boolean { let $url := fn:concat($asirra:validation-endpoint, $asirra-ticket) return let $http-result := http:send-request(<http:request href="{$url}" method="get"/>) return if(xs:integer($http-result/http:response/@status) eq $asirra:HTTP-OK)then let $asirra-result := $http-result[2] return $asirra-result/AsirraValidation/Result eq "Pass" else false() };

Pre-release 1.4.1 rev14769

Today the development team released another pre-release version of eXist-db, rev14769. It contains a number of backports of "trunk". Highlights:

  • bugfix: NPE when serialization options param was zero length string. Port of rev 14690
  • performance: Faster sequence constructors in XQuery: old code parsed (1, 2, 3) into (1, (2, 3)). Processing this recursively eventually caused a stack overflow and was slow. Port of rev 13874, rev 13875
  • bugfix: Local XMLDB API set permissions on the wrong collection - looks like this is an old bug. Port of rev 14735

The revision can be downloaded as an installer jar, exe and as a war file. Please share your experiences (bug reports, general feedback) on the exist-open mailinglist so we can release a final version soon!

In my first screencast ever, I'd like to present a little "fun project" I've been working on secretly during the past months. eXide is a replacement for the old XQuery Sandbox in eXist, offering features you normally do not find in a web-based editor.

Read article ...

Many Javascript libraries expect data to be served in JSON (Javascript Object Notation). Since a long time, eXist has provided an XQuery library to convert XML to JSON. This worked well for smaller fragments, but was rather inefficient for larger chunks of data. We have thus developed a faster JSON output method, which directly plugs into eXist's XML serializer.

Read article ...