Context Preprovisioning

Revision as of 15:17, 1 August 2017 by Dominik.epple (talk | contribs) (Schema pre-generation)

Context Preprovisioning

The standard createcontext call (be it via CLT or RMI or SOAP) is designed to work under all possible cornercases, like concurrent createcontext, createuser, deletecontext, including potential generation and removal of DB schemas and correct counting of schemas on target (DBs, Schemas, Filestores) to allocate the context correctly according to the rules (like weights, etc).

In order to achieve this, a number of expensive DB queries and extensive locking needs to take place. We optimized this as far as possible while still meeting the "general purupose" requirements outlined above, but it still is costly.

For other usecases a lot of these queries and locks is not necessary. In particular if it is possible to allocate a "pre-provisioning" phase in an implementation project where only createcontext takes place, and all schemas can be pre-generated, a lot of these allocation queries and locks can be skipped.

This article is to describe the prerequisites of this "fast mode preprovisioning", and how to execute it.

Prerequisites

  • All required DB schemas are pre-generated (see below)
  • Only createcontext, no other provisioning calls, no HTTP API calls
  • Only one provisioning host (which usually does not become the bottleneck; the bottleneck is on the DB usually)

Execution

Schema pre-generation

To pre-generate schemas, you can temporarily set the setting CONTEXTS_PER_SCHEMA in /opt/open-xchange/etc/plugin/hosting.properties to the value 1. Then, every subsequent createcontext will trigger the generation of a schema. So we recommend to generate the schemas with placeholder contexts which will not actually used for anything, just to generate the schemas, and by not deleting them we make sure the schemas are not teared down also.

So with this setting CONTEXTS_PER_SCHEMA=1, you create as much contexts as you require for your final number of contexts to be provisioned. Then you change that setting back to its original value (as per system high level design, usually something between 1000 and 7000).

Update: as of 7.8.3, there is also a tool called createschema to bootstrap schemas without the aforementioned trick. However, these schemas are only suitable for schema-name based fastmode, so the aforementioned trick is still required for bootstrapping schemas for use with in-memory or automatic mode.

Context Creation in Fast Mode

There are a number of options you need to set to run in "fast mode".

  • In order to save OX from counting contexts per filestore, you should use the --destination-store-id <id> argument to specifiy a target filestore.
  • The same reasoning applies to the target database, --destination-database-id <destination-database-id>
  • Then to use the "fast mode" (skipping locks and some checks) you can use one of the two following. For a discussion see below.
    • --schema-name <schema-name>
    • --schema-strategy=in-memory

Then, the most important thing is: context creation parallelizes reasonably well at least until the number of DB hosts you got in your setup, perhaps even beyond. So you want to implement some parallel scheme of creating contexts, but with lesser parallelity than the number of schemas you pregenerated. The optimal number of parallel createcontext "streams" depends on your infrastructure and is to be determined heuristically. On single node development machines values up to 10 concurrent streams are reasonable. On larger test platforms (and also on a production platform) we found that up to 100 streams make sense and increase the throughput. But, if increasing the number of streams further does not increase throughput, but only latency, you should reduce the number of streams again.

To implement parallel streams of createcontext, you can do one of the following:

  • The most simple (and most simple to understand) approach is to create one huge .csv file which is designed to be fed into the CLT createcontext --csv. Then you split that file into a number of chunks which matches the number of parallel streams you want to run. Then you invoke (from a shell script, for example), just the corresponding number of createcontext --csv tools in parallel.
  • More sophisticated is to create a program which creates the required data (or reads them from some input) and uses a multithreaded worker model to create the required number of streams
  • You can use our gatling tools to load-test your system. You will probably need help setting up this stuff, so contact your OX consultant for assistance.

Schema-Name vs Schema-Strategy

These are two different implementations with the same goal to run in "fast mode", skipping locks and some checks.

The approach of the --schema-name method is to require the caller to do book-keeping of how many contexts are allocated where. Usually some simple round-robin approach is sufficient here. But be aware that the schema names you supply must match the DB ID you supply. So usually you have a tool which creates the createcontext requests, and this tool needs to be coded such that the schema names it gives are valid for the given DB ID.

The approach of the --schema-strategy=in-memory is to not require any logic from the caller, but rather do the required book-keeping in memory of OX.

Both approaches seem to work in our benchmarks with similar achieved throughput. Theoretically the in-memory approach is easier to integrate into existing workflows, but the schema-name approach is much simpler inside of OX and potentially more robust.