AppSuite:Cross folder fulltext search with Dovecot

From Open-Xchange
This information is valid from 7.6.0

Cross-folder fulltext search with Dovecot

With 7.6.0 Open-Xchange introduces a new search API and an according extension for OX App Suite. The new mail search can utilize fulltext and cross-folder search capabilities, as they are provided by Dovecot. This article aims to be a short walkthrough for setting up Dovecot and the OX App Suite backend accordingly. I assume that you already have a working Dovecot installation that is used as the primary mail backend for your (again already existing and basically configured) OX App Suite installation. I further assume that you are running Dovecot in version 2.2.9 and OX App Suite 7.6.0 on Debian Wheezy. We will use [[1]] as mail index.

Dovecot

Both search features are realized via Dovecot plugins. Fulltext search relies on [[2]] and [Solr]. Cross-folder search is realized via a virtual folder, that claims to contain all mails from all other folders. Debian Wheezy comes with Dovecot 2.1.7. As this version is quite old, we take the packages from the backports repository, which contains version 2.2.9. Nevertheless this should all work with 2.1.7 also. After adding the backports entries my /etc/apt/sources.list looks like this:

deb http://ftp2.de.debian.org/debian/ wheezy main contrib non-free
deb-src http://ftp2.de.debian.org/debian/ wheezy main contrib non-free
deb http://ftp2.de.debian.org/debian/ wheezy-updates main contrib non-free
deb-src http://ftp2.de.debian.org/debian/ wheezy-updates main contrib non-free
deb http://ftp2.de.debian.org/debian/ wheezy-backports main contrib non-free
deb-src http://ftp2.de.debian.org/debian/ wheezy-backports main contrib non-free

Installing and configuring the fulltext index

We start with installing Solr. I use solr-jetty here because it's easy to configure and more leightweight than a full tomcat installation. Later we will use the Solr admin panel to see if everything works. The admin panel uses Java Server Pages (JSP), therefore we also need a JDK. We install both via

# aptitude install solr-jetty openjdk-6-jdk

We also need Dovecots FTS Solr plugin. When installing Dovecot packages, we have to explicitly choose the backports repository or we get the original 2.1.7 packages. Install the plugin via

# aptitude -t wheezy-backports install dovecot-solr

Before we can start Jetty, which in turn starts Solr as a web app, we have to configure it. The Solr admin panel refers to a shared jQuery library, that is linked into the web app. Therefore we have to allow the delivery of symlinks by Jetty. Open /etc/jetty/webdefault.xml and add the following to the <servlet>-section:

<init-param>
     <param-name>aliases</param-name>
     <param-value>true</param-value>
</init-param>

Then open /etc/default/jetty and change NO_START to 0. To access Solrs admin panel Jetty must accept connections from the outside. So you have to adjust JETTY_HOST accordingly. Further there is a problem with the softlink that links the solr webapp into Jetty. /var/lib/jetty/webapps/solr points to /usr/share/solr/webapp, what is wrong. The correct path is /usr/share/solr/web. We fix this with

# rm /var/lib/jetty/webapps/solr
# ln -s /usr/share/solr/web /var/lib/jetty/webapps/solr

You may also check the permissions for the jquery library. By default, it's a symlink at /usr/share/solr/web/admin/jquery-1.4.3.min.js which throws a 404 when visiting the solr admin panel. Examinate /var/log/jetty/*stdout.log for any Jetty related issues.

The productive use has shown to change these values in the /etc/jetty/jetty.xml to get jetty work with big mailboxes:

<Set name="maxIdleTime">100000</Set>
<Set name="headerBufferSize">65536</Set>

Now you can start Jetty with

# service jetty start

If everything was done correctly, you should reach Solrs admin panel at %%http://<ip or host>:8080/solr/admin%%.

Now its time to activate the FTS Solr plugin in Dovecot and to make use of our freshly set up indexing server. Open /etc/dovecot/conf.d/10-mail.conf and change

#mail_plugins

to

mail_plugins = fts fts_solr

To configure the plugins open /etc/dovecot/conf.d/90-plugin.conf and change the plugin-section to

plugin {
 fts = solr
 fts_autoindex = yes
 fts_solr = url=http://localhost:8080/solr/
}

fts_autoindex = yes will enable automatic indexing of new mails. Dovecot 2.2.9 or newer is required by this feature.

Solr uses a so called schema that defines the structure of the index. A schema is defined in an XML file. The schema for Dovecot is provided by the dovecot-solr package and can be found at /usr/share/doc/dovecot-core/dovecot/solr-schema.xml. We have to replace Solrs default schema with this one:

# cp /usr/share/doc/dovecot-core/dovecot/solr-schema.xml /etc/solr/conf/schema.xml

Afterwards we restart Jetty and Dovecot to apply the changes.

# service jetty restart
# service dovecot restart

Now we can start indexing some mails, to see if everything works. According to [FTS-Solr manual] we can index a mailbox with

# doveadm fts rescan -u <user>
# doveadm index -u <user> '*'

The first call instructs Dovecot to remember that a rebuild of all indexes for the given user is necessary. The second one then forces the rebuild-operation and blocks until all mails are indexed. We use * as a wildcard for all mailboxes of the given user here. %%http://<ip or host>:8080/solr/admin/schema.jsp%% should now show the numDocs value as equal to the number of mails contained in the mailbox.

Congratulations, searching within mail bodies now utilizes Solr and is blazing fast!

Configuring the all-messages folder

To enable cross-folder search we configure Dovecot to add a special folder to every mailbox. From the outside the folder looks like it contains all mails from all other folders. Some more information can be found [[3]].

We start with creating an own namespace for the virtual folder. Open /etc/dovecot/conf.d/10-mail.conf and add the following section:

namespace virtual {
 prefix = virtual.
 separator = .
 location = virtual:/etc/dovecot/virtual:INDEX=~/virtual
}

Additionally you have to extend the mail_plugins-directive to mail_plugins = fts fts_solr virtual. Now we can create arbitrary virtual folders that appear in every users accounts by simply creating a directory below /etc/dovecot/virtual and configuring that folder by placing a file dovecot-virtual within it. We do this for our all-messages folder:

# mkdir -p /etc/dovecot/virtual/all

and create a file /etc/dovecot/virtual/all/dovecot-virtual with the following content:

INBOX/*
 all
INBOX
 all

This means, the folder aggregates all messages from all mailboxes. The * therefore selects all mailboxes. You likely want to change this, if you have additional namespaces for shared and public folders. Please refer to the Dovecot Wiki in this case.

For App Suite to be able to display the original folders within the search results Dovecot needs to announce an additional capability which needs to be added via

imap_capability = +XDOVECOT

The folder is already visible now in every users account. In the last step we configure the folder to carry the \All-flag. Edit /etc/dovecot/conf.d/15-mailboxes.conf and add the following section:

namespace virtual {
 mailbox all {
   special_use = \All
 }
}

Then restart Dovecot again. A short test reveals, that everything works as expected:

# telnet <imap_host> 143
Trying <ip>...
Connected to <ip>.
Escape character is '^]'.
* OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE STARTTLS AUTH=PLAIN] Dovecot  ready.
. LOGIN <username> <password>
. OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT SORT=DISPLAY   THREAD=REFERENCES THREAD=REFS THREAD=ORDEREDSUBJECT MULTIAPPEND URL-PARTIAL CATENATE UNSELECT CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS SPECIAL-USE BINARY MOVE SEARCH=FUZZY] Logged in
. NAMESPACE
* NAMESPACE (("" ".")("virtual." ".")) NIL NIL
. OK Namespace completed.
. LIST "" "*"
* LIST (\HasNoChildren \Trash) "." Trash
* LIST (\HasNoChildren) "." "Sent Items"
* LIST (\HasNoChildren \Drafts) "." Drafts
* LIST (\HasNoChildren) "." Spam
* LIST (\Noselect \HasChildren) "." virtual
* LIST (\HasNoChildren \All) "." virtual.all
* LIST (\HasNoChildren) "." INBOX
. OK List completed.
. SELECT virtual.all
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft $cl_0)
* OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft $cl_0 \*)] Flags permitted.
* 6 EXISTS
* 0 RECENT
* OK [UNSEEN 4] First unseen.
* OK [UIDVALIDITY 1395244998] UIDs valid
* OK [UIDNEXT 7] Predicted next UID
* OK [NOMODSEQ] No permanent modsequences
. OK [READ-WRITE] Select completed (0.004 secs).
. LOGOUT
* BYE Logging out
. OK Logout completed.

OX App Suite

The last step is to configure OX App Suite. At first open /opt/open-xchange/etc/imap.properties and change the value of com.openexchange.imap.imapSearch to force-imap. Then open /opt/open-xchange/etc/findbasic.properties and change it to

# Some mail backends provide a virtual folder that contains all messages of
# a user to enable cross-folder mail search. Open-Xchange can make use of
# this feature to improve the search experience.
#
# Set the value to the name of the virtual mail folder containing all messages.
# Leave blank if no such folder exists.
com.openexchange.find.basic.mail.allMessagesFolder = virtual.all
# Denotes if mail search queries should be matched against the mail body.
# This improves the search experience within the mail module, if your mail
# backend supports fast full text search. Otherwise it will slow down the
# search requests significantly.
#
# Change the value to 'true', if fast full text search is supported. Default
# is 'false'.
com.openexchange.find.basic.mail.searchmailbody = true

Restart the server with

/etc/init.d/open-xchange restart

TODO: add image from App Suite UI showing the "All Folder"-Switch and explain that queries are now matched against mail bodies in addition to subject and address headers.