AppSuite:DocumentConverter Installation Guide

From Open-Xchange
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Product Description

Open-Xchange Inc. (“Open-Xchange”) has created a proprietary software program called the Open-Xchange Document Converter (the “Software”), which converts Microsoft (OOXML) and OpenOffice (ODF) office documents to PDF or to HTML5 with embedded SVG files (Scalable Vector Graphics). Additional the OX Document Converter converts PDF format files to SVG files and Microsoft files to OpenOffice files and backwards. So users can view and print documents in the existing infrastructure without any necessary additional plugins.

Introduction

Offering document preview functionality within OX App Suite, the user expects to be able to open as much different document formats as possible or - to get a better picture - she doesn't need to take care of the document format she just received. It should just work, without knowing anything about document formats at all.

To offer such transparent behavior to the user, OX App Suite needs to take care of converting a lot of document formats into the display formats needed by OX Drive. OX Drive is extended with document preview functionality by the module OX Document Viewer.
DocumentConverter version 7.10.2-rev5 and higher support preview functionality in OX App Suite 7.6.3, 7.8.4 and 7.10.x.

The conversion functionality is also available as stand-alone product OX Document Converter. The OX Document Converter Service allows customers the flexible integration of document conversion in their offering.

The API reference describes the available actions with request parameters and results.

Requirements

OX Document Converter requires a 64bit systems; 32bit systems are not supported.
See the Open-Xchange software requirements page for details.

See the section about Configuration and Hardware for recommendations how to adapt settings to CPU and memory resources.

Update to 7.8.2

Note: For the update from earlier versions to 7.8.2 some manual adjustments are necessary. Please follow the hints described in this article.

Details for deployments from 7.8.2 on can be found within the articles itself.

Download and Installation

The OX Document Converter deployment consists of two functional modules, that need to be installed separately: the readerengine component and the Document Converter Server component.

ReaderEngine

See Readerengine installation instructions

Server

See Document converter installation instructions

See Document converter API installation instructions

Configuration

After installation of both components readerengine and server, the administrator needs to make some adjustments to the configuration. First, if this is a new installation, make sure that the node is allowing it's services to listen and be available for traffic. You can configure this by defining the "host to listen" in /opt/open-xchange/etc/overwrite.properties: com.openexchange.connector.networkListenerHost=0.0.0.0 Defined here as 0.0.0.0 as a wildcard address.

In order to enable users to use Preview functionality, they must be given the appropriate permission in OX App Suite middleware properties, i.e. use com.openexchange.capability.document_preview=true in the file /opt/open-xchange/etc/permissions.properties to set this globally, or use ConfigCascade.

The component readerengine works with the default configuration. The settings are in the file documentconverter.properties located in the directory "/opt/open-xchange/documentconverter/etc" as described below.

A summary of all configuration items, together with each default value, is given below. Although the defaults have been carefully chosen for a real life deployment, the admin should take a closer look at each of them and adjust them accordingly, if necessary.

com.openexchange.documentconverter.installDir=/opt/readerengine

This item contains the the directory of the libreaderengine installation. The libreaderengine installation directory in general contains the ./program directory, which itself contains the engine executables.
VERY IMPORTANT: If not set correctly, the complete web service will be nonfunctional.
Default value: "/opt/readerengine"

com.openexchange.documentconverter.cacheDir=/var/spool/open-xchange/documentconverter/readerengine.cache

This item contains the directory that will make up the cache for persistent job data. The directory itself does not need to exist at startup, but the parent directory needs to exist and needs to have write permissions for the user running the servlet, in order for the servlet to create this cache directory at runtime.
VERY IMPORTANT: If not set correctly, the complete web service will be nonfunctional.
Default value: "/var/spool/open-xchange/documentconverter/readerengine.cache"

com.openexchange.documentconverter.scratchDir=/var/spool/open-xchange/documentconverter/readerengine.scratch

This item contains the directory, that will make up the runtime enironment for the readerengine. The directory itself does not need to exist at startup, but the parent directory needs to exist and needs to have write permissions for the user running the servlet , in order for the servlet to create this cache directory at runtime.
VERY IMPORTANT: If not set correctly, the complete web service will be nonfunctional.
Default value: "/var/spool/open-xchange/documentconverter/readerengine.scratch"

com.openexchange.documentconverter.errorDir=

This item specifies a directory for files that could not be loaded due to an error condition or due to a timeout.
Note: The used disk space will grow with retained files. Files have to be removed manually.
Default value: n/a

com.openexchange.documentconverter.blacklistFile=/opt/open-xchange/etc/readerengine.blacklist

The list of external document content URLs that are not allowed to be loaded by the readerengine after loading a document. The file itself contains a list of (newline separated) regular expressions. Each external URL is first checked against the list of blacklist URL regular expressions. If the external URL matches one blacklist entry, the external URL is then checked against the list of whitelist URL regular expressions. The behavior in summary is as follows: If the URL is not blacklisted and not whitelisted, it is resolved at runtime. If the URL is blacklisted but not whitelisted, it is not resolved at runtime. If the URL is not blacklisted but whitelisted, it is resolved at runtime. If the URL is blacklisted and whitelisted, it is resolved at runtime. In boolean notation: valid = (!blacklisted) || whitelisted Please note that the regular expressions need to fully qualify the patterns that the URL should be checked against. Upper/Lower cases need to be handled by the regular expression as well. The file itself needs to be UTF-8 encoded to be read appropriately.
Default value: "/opt/open-xchange/etc/readerengine.blacklist"

com.openexchange.documentconverter.whitelistFile=/opt/open-xchange/etc/readerengine.whitelist

The list of external document content URLs that are allowed to be loaded by the readerengine after an external URL matched a blacklist pattern. The file itself contains a list of (newline separated) regular expressions. Each external URL is only checked against the list of whitelist URL regular expressions if it previously matched a pattern in the blacklist file. If the external URL matches one blacklist entry, the external URL is then checked against the list of whitelist URL regular expressions. The behavior in summary is as follows: If the URL is not blacklisted and not whitelisted, it is resolved at runtime. If the URL is blacklisted but not whitelisted, it is not resolved at runtime. If the URL is not blacklisted but whitelisted, it is resolved at runtime. If the URL is blacklisted and whitelisted, it is resolved at runtime. In boolean notation: valid = (!blacklisted) || whitelisted Please note that the regular expressions need to fully qualify the patterns that the URL should be checked against. Upper/Lower cases need to be handled by the regular expression as well. The file itself needs to be UTF-8 encoded to be read appropriately.
Default value: "/opt/open-xchange/etc/readerengine.whitelist"

com.openexchange.documentconverter.urlLinkLimit=200

The external URL link limit specifies the maximum amount of valid external internet URLs (filtered by blacklist and whitelist before), that are tried to get resolved by the engine when loading a document. When this limit is reached, no more external internet URLs are resolved for the current document.
Important: Please take note than one externally linked object within the document does not automatically correspond to one external URL call. In general, there are - at least - two URL calls necessary to display one externally linked object. Such additional calls are in most cases based on a format detection, happening prior to resolving the object data itself.
Set to -1 for no upper limit or to 0 to disable the resolving of internet URLs completely
Default value: 200

com.openexchange.documentconverter.urlLinkProxy =

The external URL link proxy entry specifies a proxy server that is used by the readerengine to resolve external links, contained within a document. Such links are e.g. external http:// graphic links, that are going to be resolved during the filtering process of a readerengine instance. Set this entry to the address of the proxy server: host:port Recognized protocols are http://, https:// and ftp:// Leave empty, if no proxy server should be used by the readerengine
Default value: n/a

com.openexchange.documentconverter.RemoteBaseUrl =

Use a remote document conversion webservice to do the actual conversion; Set this entry to the base URL of the remote host http://host[:port]/documentconverterPath; leave empty if conversion should happen on the local machine
Default value: n/a

From 7.8.2 on: The com.openexchange.documentconverter.RemoteBaseUrl is not valid for the documentconverter.properties file anymore. The corresponding documentconverter server needs to be set on the Ox backend node, where the documentconverter-client package has been installed. The name of the new entry is com.openexchange.documentconverter.client.remoteDocumentConverterUrl. The entry itself is located within the documentconverter-client.properties configuration file>

com.openexchange.documentconverter.RemoteCacheUrls =

Use one or more remote converter cache(s) to speedup the conversion. The first entry, if set, is treated as the remote master cache, receiving cache updates from the local cache. Additional entries are treated as remote slave caches for read purposes only.
Set the (whitespace separated) entries to the base URL('s) of the appropriate remote host(s): http://host[:port]/documentconverterCachePath
Leave empty if only the local filesystem cache should be used
Default value: n/a

com.openexchange.documentconverter.RemoteSharePointUrl =

Use a remote SharePoint service to do MSO to PDF conversions.
Set this entry to the URL of the SharePoint host: http://host[:port]/_vti_bin/oxconvert.svc/mex?wsdl
If left empty, the corresponding conversion job always returns false.
Default value: n/a

com.openexchange.documentconverter.RemoteSharePointUsername =

The login user name to be used for calls to the SharePoint service
Default value: n/a

com.openexchange.documentconverter.RemoteSharePointPassword =

The password to be used for calls to the SharePoint service
Default value: n/a

com.openexchange.documentconverter.jobProcessorCount=3

This item determines the number of engines working in parallel for job execution. The value needs to be greater or equal to 1, with best performance results about (n-1), where n specifies the number of available CPU cores of the machine the service is running on.
Default value: 3

com.openexchange.documentconverter.jobRestartCount=50

This item determines the maximum number of executed jobs after which a single engine is automatically restarted in order to avoid memory fragmentation and possible memory leaks within one libreaderengine instance,
Default value: 50

com.openexchange.documentconverter.jobExecutionTimeoutMilliseconds=60000

This item determines the timeout in milliseconds, after which the execution of a single job is terminated.
Default value: 60000

com.openexchange.documentconverter.maxVMemMB=2048

This item determines the maximum size in megabytes (MB) of virtual memory that each started readerengine process is allowed to consume. If a job tries to consume more VMem than set via this config item, the processing of the current job for the appropriate readerengine process will be aborted and the underlying process is restarted to avoid memory corruption.
Set this value to -1 for no upper limit.
Default value: 2048

com.openexchange.documentconverter.maxCacheSizeMB=-1

This item determines the maximum size in megabytes (MB) of all persistently cached converter job entries at runtime. A larger value may drastically reduce the time for conversion jobs, e.g. in case of a repeated creation of document previews.
Set this value to -1 for no upper limit.
Default value: -1

com.openexchange.documentconverter.maxCacheEntries=-1

This item determines the maximum number of converter jobs cached at runtime. The value affects the amount of runtime job information to be cached as well as the number of file entries within the cache directory.
Set this value to -1 for no upper limit.
Default value: -1

com.openexchange.documentconverter.cacheEntryTimeoutSeconds=2592000

This item determines the timeout in seconds, after which a cached job result is automatically removed from the cache.
Set this value to 0 to disable the timeout based removal of cached job results.
Default value: 2592000

com.openexchange.documentconverter.enableCacheLookup=false

Setting this flag to true enables the caller of the RemoteInternalPreviewService#getCachedPreviewFor implementation (OfficePreviewService) to retrieve the cached only result of a previous conversion call, without scheduling a new job in case of a non existing cache entry, which might run for a long period time, up to the given job timeout time.
Set to false to disable the cache lookup within the RemoteInternalPreviewService#getCachedPreviewFor implementation.
Default value: false

com.openexchange.documentconverter.errorCacheTimeoutSeconds=600

This value determines, how long an error, associated with a job hash value, is held within the error cache. If the timeout has not been reached, additional RemoteInternalPreviewService#getPreviewFor calls with the same job hash will instantly return with the cached error code instead of processing the job again.
Set to 0 to disable the error cache handling.
Default value: 0

com.openexchange.documentconverter.errorCacheMaxCycleCount=5

This value determines the number of cycles, a job, associated with a job hash value, is added to the error cache. One cycle starts after adding a job to the error cache and ends after the errorCacheTimeout has been reached. After reaching the given maximum cycle count, the job is not removed from the error cache anymore and will be held within the error cache for the rest of the runtime of the current backend instance. Since the error cache is not persistent, the cycle counter for each job hash is reset after a restart of the backend instance.
Set to 0 to disable the error cache handling.
Default value: 5

com.openexchange.documentconverter.servletLocalFileUrls=false

This item determines, if the documentconverter servlet should be allowed to handle file Urls of the form file://... The file Url itself is a resource that locates files that are locally accessible on the machine, the documentconverter backend is running on.
Default value: false

com.openexchange.capability.sharepointconversion=false

Capability to enable the usage of a SharePoint conversion server; capability is only checked, if a valid SharePoint remote converter has been configured appropriately
Default value: false

Handling of temporary files

The DocumentConverter server needs to store files at runtime for different purposes at different volume locations:

  • Persistent files (Cache) The files that should last longer than the runtime of one converter instance are stored at the configurable com.openexchange.documentconverter.cacheDir directory. As the name of the property implies, such files are result cache entries used by multiple converter instances. This directory is monitored at runtime and all files are managed by the converter. Constraints for this directory are set via the converter properties com.openexchange.documentconverter.minFreeVolumeSizeMB, com.openexchange.documentconverter.maxCacheSizeMB, com.openexchange.documentconverter.maxCacheEntries and com.openexchange.documentconverter.cacheEntryTimeoutSeconds.
  • Medium lasting files These files are only valid for the runtime of one converter instance (e.g. ReaderEngine related runtime config files for each ReaderEngine instance). They are stored within the configurable com.openexchange.documentconverter.scratchDir directory. This directory is not constantly monitored at runtime but all files, contained in the ${com.openexchange.documentconverter.scratchDir}/oxdc.tmp sub directory are managed by the converter during the startup and shutdown phase of one converter server instance. In this case, the whole ${com.openexchange.documentconverter.scratchDir}/oxdc.tmp directory gets cleaned up during converter server shutdown as well as converter server startup. Initial cleanup during startup is necessary due to the fact, that the last converter instance might have aborted for unknown reasons, like e.g. power outage, VM abort etc.
  • Short lasting files These files are stored within the Java VM specific I/O temporary directory, whose location is configurable via the Java VM system property java.io.tmpdir. This directory is used by the converter to temporarily store request attachments in most cases. The files stored within this directory have a lifetime equal to the duration of the request itself. When the request has been finished, the appropriate files are cleaned up. For the converter, this means that e.g. source files to be converted and attached to the request are extracted from the request and stored in order to prevent exceeding memory consumption by source file buffers. When the conversion request is finished, the stored temporary file gets deleted.

From 7.10.2 on: The java.io.tmpdir Java system property specified directory will not be used by the converter anymore. Instead, even short living temporary files will be stored at the ${com.openexchange.documentconverter.scratchDir}/oxdc.tmp location. By this change, even short living files will be stored inside this managed directory, so that a server shutdown/start cleans up this directory automatically. This change affects all files created by the converter implementation itself. Temporary files from other baseline bundles might still be stored within the configured java.io.tmpdir.


Processing Overwiew

In order to make use of the DocumentConverter to convert a document into an appropriate target format, the user sends HTTP(s) requests to the DocumentConverter web service. The possible requests and request parameters are specified within the API reference section.

When a conversion request is received by the DocumentConverter web service, the document to be converted and specified by the URL parameter of the request is first retrieved and temporarily stored. The web service request handler in addition extracts and prepares all necessary parameters, creates a conversion job for the underlying DocumentConverter, makes the conversion call and waits for the result of the conversion to be returned. The temporary, locally stored source document gets deleted automatically before the DocumentConverter web service request returns the response.

A call for document conversion is first scheduled within a queue by the DocumentConverter to be able to convert a high amount of documents in parallel, even with limited system resources and parallel conversion engines running. This is necessary since the conversion of one document might consume a lot of processing time (up to 60s by default per configured conversion engine by default).

If one of the configured, parallel conversion engines gets available, it retrieves the next job from the queue and performs the conversion itself.

After the conversion by one of the parallel running conversion engines of the DocumentConverter has finished, the result of the conversion is returned to the caller of the conversion request. The whole conversion process in this case is a synchronous process from the point of view of the caller.

Conversion Job Priority

In case of larger deployments with different document conversion demands like in the Appsuite case, it is often needed to specify some constraints wrt. the calling performance for conversion requests. E.g. in case of conversion requests as a result of direct user input within the frontend, the conversion result should be available as soon as possible for a better UI experience for the user. There are other scenarios for which there's no need for a fast conversion response, e.g. in case of a background pre conversion of documents, that are not visible for the customer at the moment of the conversion request. For such cases, the DocumentConverter provides a mechanism to set a priority at each conversion request via the 'priority' parameter. The currently available priority parameter values in increasing order are:

  • background
  • low
  • medium
  • high
  • instant

In case no priority is set within the request, the default priority 'low' is used internally.

The set priority of a conversion job affects the sorting order of the conversion queue of the DocumentConverter. Conversion jobs with a higher priority will always be processed before jobs with a lower priority. Conversion jobs with the same priority are added to the queue in a time based, sequential order according to their set priority.

Due to this fact and a possible high amount of other conversion jobs with a higher priority due to e.g. direct user interaction, the caller of a conversion request should ensure to set the appropriate priority with respect to her current conversion task. In case of conversion results, intended to be seen by the user as soon as possible, setting the highest conversion priority 'instant' should be taken into account in general.

Setup And Best Practices

The whole documentconverter product consists of 3 different parts: the OX backend documentconverter bundles, written in Java as OSGi plugins, the native readerengine and a native tool called pdf2svg.

Readerengine

The readerengine is - as the name implies - the low level backend engine for the converter implementation, that acts as a general purpose document conversion solution, being able to read a vast number of commonly used document formats like Microsoft binary and XML formats (MS Word, MS Excel, MS Powerpoint), the whole set of ODF (Open Document Format) document formats and many more, and to export the loaded documents into different output formats like PDF.

The readerengine itself is a stripped down Open Office installation, containing some code enhancements and optimizations, provided by Open-Xchange developers. The packaging is also done in-house to have control of the whole development cycle, versioning and other aspects of the product development.

PDF2SVG

The pdf2svg component is a tool, that uses libpoppler for reading and processing PDF source documents and Cairo for rendering the processed PDF into SVG, JPEG and PNG formats, to be finally rendered within the OX appsuite client.

Documentconverter OSGi bundles

The OX appsuite backend bundles of the documentconverter are responsible for receiving and processing client conversion requests, to schedule conversion jobs in multi threaded queues, to manage the correct startup and shutdown of (parallel) readerengine and readerengine-pdf2svg instances, managing the caching of conversion results, communication with remote converters and remote converter caches etc.

After installation of the documentconverter and related bundles, only minor changes have to be made by the admin in order to get the full functionality available within the appsuite.

Documentconverter system packages

The Java OSGi bundles for the documentconverter functionality are distributed via four different packages in total (revision numbers are omitted here):

Deployments until 7.8.1
  • documentconverter-api This bundle contains interfaces and other simple classes that are used by different other bundles to get access to the documentconverter OSGi service.
  • documentconverter This bundle contains the core documentconverter implementation as an OSGi service.
  • documentconverter-webservice This bundle contains the functionality to let the documentconverter offer its functionality as a webservice. In a remote scenario, this bundle needs to be installed only on those machines, that are configured to do the real conversions.
  • documentconverter-jolokia This optional bundle can be installed to make internal conversion statistics available for monitoring via JMX, Jolokia, Munin et al.
Deployments from 7.8.2 on
  • documentconverter-api This bundle contains interfaces and other simple classes that are used by the documentconverter client and server bundles
  • documentconverter-client This bundle contains the documentconverter client implementation, running on the Ox middleware as an OSGi service.
  • documentconverter-server This bundle contains the documentconverter server implementation, running as a webservice on its own Ox node.
  • documentconverter-jolokia This optional bundle can be installed to make internal conversion statistics available for monitoring via Munin et al. The contained scripts provide access to the statistic data either via the modern Jolokia JMX bridge or the old fashioned showruntimestats functionality, provided by the OX appsuite backend. It needs to be installed on the documentconverter server side.

General

By taking a look at the overview of documentconverter system packages in the chapter above, it can be seen, that the two essential packages to be installed on the Ox backend (middleware) are the open-xchange-documentconverter-api and the open-xchange-documentconverter (until 7.8.1) / open-xchange-documentconverter-client (from 7.8.2 on) packages.

If the current OX backend (middleware) node is configured to only receive client requests and to make remote conversion calls to an other node, configured as a real converter node, no additional packages need to be installed.

In order to act as real converter node, the readerengine and the readerengine-pdf2svg packages need to be installed as well.

Until 7.8.1 For an OX backend node to act as a conversion webservice, the open-xchange-documentconverter-webservice package needs to be installed beside the open-xchange-documentconverter-api, the open-xchange-documentconverter, the readerengine and the readerengine-pdf2svg packages.

The open-xchange-documentconverter-jolokia package only makes sense to be installed on an OX backend node, that acts as real converter node. It is an optional package, that is intended for monitoring purposes.

In summary, the following packages must be installed in order to act as a real converter node:

open-xchange-documentconverter-webservice readerengine open-xchange open-xchange-grizzly open-xchange-authentication-database

The package open-xchange-authentication-database is just needed to fulfill the dependencies.

From 7.8.2 on For an OX backend node to act as a conversion webservice, the open-xchange-documentconverter-server package needs to be installed beside the open-xchange-documentconverter-api, the readerengine and the readerengine-pdf2svg packages.

The open-xchange-documentconverter-jolokia package only makes sense to be installed on an OX backend node, that acts as real converter node. It is an optional package, that is intended for monitoring purposes.

In summary, the following packages must be installed in order to act as a real converter node. Package dependency management on the system ensures the installation of all depending packages:

open-xchange-documentconverter-server readerengine open-xchange open-xchange-grizzly open-xchange-authentication-database

The package open-xchange-authentication-database is just needed to fulfill the dependencies.

Monitoring

The open-xchange-documentconverter-jolokia package is an optional package, that is only useful to be installed on an OX DocumentconverterServer node, that acts as real converter node. This article guides to the information how to access JMX data, configure and use Jolokia and integrate with munin.

Single OX backend with documentconverter / Single converter node

Deployments from 7.8.2 on: this deployment scenario is only supported until release-7.8.1. From release 7.8.2 on, the only available deployment scenario is a client / server scenario with two dedicated nodes acting as documentconverter client and documentconverter server (webservice).

The simplest possible scenario is definitely a single OX appsuite backend installation with the additional installation of the open-xchange-documentconverter-api, open-xchange-documentconverter-server, readerengine and readerengine-pdf2svg packages.

Nevertheless, this configuration can be considered as the prototype configuration block for a real converter node with different setups. Therefore, the description for this building block will contain all details, that are necessary to setup even a highly complex scenario.

Important: although it is possible to run a real documentconverter on a standard OX backend node together with all other provided services, this setup is not recommended for production use, as long as each single node hardware is not sufficient enough to handle everything appropriately. This is especially true, if this node is also configured to act as a calcengine server, which itself has a resource consumption, that can be seen on the high side with a lot of memory consumption.

Document conversion is a process, that might happen in the background all of the time due to permanent client requests. In such cases, the standard OX backend behavior might be slowed down in a way, that is not acceptable anymore. This is really important in cases of inferior hardware resources.

So, please use this all-in-one scenario only for quick evaluation purposes, demo setups or if there's really no dedicated conversion hardware available.

Attention: this deployment scenario is only supported until release-7.8.1. From release 7.8.2 on, the only available deployment scenario is a client / server scenario with two dedicated nodes acting as documentconverter client and documentconverter server (webservice).

Configuration and Hardware

After installation of the single documentconverter related packages, the default values for the documentconverter properties can be found in the file /opt/open-xchange/etc/documentconverter.properties. Each entry is documented and has been assigned a default value, that is in most cases sufficient for a first setup. Please take care that all listed directory paths have the appropriate write permissions for the OX backend user.

VERY IMPORTANT: Each real converter node needs its own scratch and cache directories. Don't set these directories to shared directories on NFS shares etc., since file writes to the same cache/scratch directories by different backend nodes are not synchronized at the moment and will most possibly corrupt some files over the time. Actually, it is possible to use a shared network drive for the cache and scratch directories, as long as the finally used directories are unique for each given node. In this case, the documentconverter.properties file on each node has to be adjusted accordingly to guarantee such a unique, per node directory structure.

In addition, you should check the following entries for their validity/sensibility:

  • For the scratch directory, it is recommended to use a very fast volume (e.g. on SSD or even a RAM disk, if memory allows), since temp. files are permanently created and deleted within this directory. The whole documentconverter performance will benefit from a high speed volume in any case.
  • The cache sizes, configured in the properties file, should be checked against the given hardware and volume sizes and adjusted accordingly, if wanted/needed. Cache entries are only read/written once during a conversion, so that the used medium does not need to be a high speed one. For the cache, it is better to use more memory than speed, since cached files will accelerate the whole conversion process the most.
  • The number of parallel readerengine instances used is configured via the property com.openexchange.documentconverter.jobProcessorCount. The default value of three should fit on almost all modern hardware. Please adjust this value depending on the really available CPU cores in your case. As long as the current node is configured as a conversion only node, the number of parallel processes should be in the order of (CPU core count - 1)
  • The com.openexchange.documentconverter.maxVMemMB config entry is the virtual memory assigned to each readerengine process. You don't need to have (jobProcessorCount * maxVMemMB) in reality, since large parts of this VMem is shared memory between all running readerengine instances. So, there's no simple formula to calculate the optimum total amount of memory. For the beginning, a maxVMemMB value of 2048 is absolutely ok and sufficient. The node, the documentconverter is running on, should have at least 8GB RAM, with more RAM and more processors being better, of course. A good recommendation would be a total amount of 16GB RAM for the node and a CPU core count of 4. A dual CPU machine with 4 cores each seems to be a better recommendation, if hardware is not a crucial point in the whole setup.
  • The upper limit for the Java VM memory allocation is configured in /opt/open-xchange/documentconverter/etc/process-conf.sh. A recommended value is 2048MB (-Xmx2048m). The minimum value is 512MB. This might be applicable for instances running App Suite Middleware and Documenconverter on a single machine. Please adjust this value according to available memory. But don't set this value too high, since values greater than about 4GB will have significant negative impact on the Java VM Garbage Collector. 2048MB seemed to be a very sensible value for the -Xmx limit during our tests on a dedicated documentconverter machine.

OX backend(s) with one remote documentconverter

A remote documentconverter setup uses (at least) one OX backend, configured as a real documentconverter as described in the chapter above. This documentconverter node should not handle standard OX backend requests, but only remote conversion requests from the standard OX backends.

Deployments until 7.8.1: The standard OX backend needs to have the open-xchange-documentconverter-api and the open-xchange-documentconverter packages installed. readerengine and readerengine-pdf2svg packages are not needed on the standard OX backends.

For the remote converter node, the open-xchange-documentconverter-webservice package needs to be installed beside the open-xchange-documentconverter-api, open-xchange-documentconverter, readerengine and readerengine-pdf2svg packages.

In order to request remote conversions, the only entry, that needs to be set within the documentconverter.properties file at the standard OX backend(s) is the com.openexchange.documentconverter.RemoteBaseUrl entry. In general, this is a http-URL consisting of the remote conversion node ip address followed by the /documentconverterws path:\\

com.openexchange.documentconverter.RemoteBaseUrl=http://host[:port]/documentconverterws

No other entry from the documentconverter.properties is used on the standard OX backend if the RemoteBaseUrl is set.

In order to check, if the remote documentconverter is correctly set up, you can enter the complete RemoteBaseUrl into a browser and should see a page, stating that the documentconverter is running.


Deployments from 7.8.2 on: The standard OX backend needs to have the open-xchange-documentconverter-api and the open-xchange-documentconverter-client packages installed. readerengine and pdf2svg packages are not needed on the standard OX backends.

For the remote converter node (webservice), the open-xchange-documentconverter-server package needs to be installed beside the open-xchange-documentconverter-api, readerengine and pdf2svg packages.

There's a properties file named documentconverter-client.properties available after installing the documentconverter-client package on the standard Ox middleware node.

In order to request remote conversions, the only entry, that needs to be set within the documentconverter-client.properties file at the standard OX backend(s) is the com.openexchange.documentconverter.client.remoteDocumentConverterUrl entry. In general, this is a http-URL consisting of the remote conversion node ip address followed by the /documentconverterws path:\\

com.openexchange.documentconverter.client.remoteDocumentConverterUrl=http://host[:port]/documentconverterws

In order to check, if the remote documentconverter is correctly set up, you can enter the complete RemoteBaseUrl into a browser and should see a page, stating that the documentconverter is running.

OX backend cluster with one remote documentconverter

The OX backend cluster is the standard OX backend scenario, which is similar, if not equal to the standard OX backend with one documentconverter scenario above.

The only thing that you need to take care of is to add the the appropriate documentconverterws ProxyPass entry to the proxy_http.conf configuration file of e.g. the Apache web server:

<Proxy /documentconverterws>
ProxyPass balancer://oxcluster/documentconverterws
<proxy>

Sample cluster installation

A typical, basic cluster setup consists of two or more standard OX backend nodes (e.g. samplehost-node-1, samplehost-node-2, ...., samplehost-node-n), with each of these standard backend nodes having their com.openexchange.documentconverter.RemoteBaseUrl config entry set to a dedicated documentconverter cluster node (e.g. samplehost-node-x).

OX backend cluster with two or more documentconverters

To reduce the load on one documentconverter within a cluster installation, you can add more documentconverter nodes to the given cluster. The general setup is the same as with the Ox backend cluster with one remote documentconverter setup.


In addition, care needs to be taken to add the appropriate JSESSIONID cookie handling to the cluster setup. This is done for OX clusters in general, but mentioned here for completeness. Setting the JSESSIONID cookie is essential in a multi documentconverter setup, since the Viewer client application uses stateful calls into the OX backend to retrieve the single pages of a document on demand.

A typical Proxy configuration with proper stickysession handling looks as follows:

<Proxy balancer://oxcluster>
Order deny,allow
Allow from all
BalancerMember http://docs-develop-cluster-b1:8009 timeout=100 smax=0 ttl=60 retry=60 loadfactor=50 keepalive=On route=OX1
BalancerMember http://docs-develop-cluster-b2:8009 timeout=100 smax=0 ttl=60 retry=60 loadfactor=50 keepalive=On route=OX2
ProxySet stickysession=JSESSIONID
SetEnv proxy-initial-not-pooled
SetEnv proxy-sendchunked
</Proxy>

Deployments from 7.10.1 on Communication between OX backend and OX Documentconverter is no longer stateful. It is not necessary to take care of session stickyness starting with 7.10.1.

Deployments from 7.8.2 on Since the OX backend and the dedicated OX Documentconverter backend node are in many scenarios running on the same server, the documentconverter related node got a default port of 8008 for http communication, so that the port 8009 in the above mentioned Proxy configuration needs to be replaced by 8008 or the appropiate value from the overwrite.properties file within the documentconverter-server backend deployment.

Remote caching with two or more documentconverters

As mentioned above, using a cache gives a lot of performance improvements with subsequent conversion request. Having more than one real documentconverter node configured allows to get access to the remote caching feature.

In general, one converter acts as a so called master cache in this scenario. It is set up as usual and as described in one of the chapters above.

The second (third, ...) converter are also set up as usual, but have their com.openexchange.documentconverter.RemoteCacheUrls documentconverter.properties set, so that the first URL (of possible more URLs) points to the master cache converter.

The URL to be used is the standard RemoteBaseURL with an additional /cache path set:

com.openexchange.documentconverter.RemoteCacheUrls=http://mastercachehost[:port]/documentconverterws/cache

In this case, after searching the local cache for a conversion result after a conversion request arrives without any success, all set remote cache hosts are queried for a successful cache result.

If one of the remote cache hosts returns a valid cache result, the cache entry is locally replicated and the cache result is returned.

If none of the remote cache hosts returns a valid cache result, a local conversion is performed, a local cache entry is created and the cache result is transferred to the first set remote cache host (the master cache host).

Sharepoint support

It's possible to optionally use a Sharepoint server to perform certain conversions, mainly to convert from Microsoft Word and PowerPoint to PDF so that the documents look better in the OX Document Viewer.

No OX user credentials are needed on the Sharepoint server, as shown in this overview:

OX Document Converter Overview.jpg