AppSuite:DocumentsMonitoring: Difference between revisions

From Open-Xchange
 
(3 intermediate revisions by the same user not shown)
Line 635: Line 635:
{"status":"UP","details":{"db":{"status":"UP","details":...
{"status":"UP","details":{"db":{"status":"UP","details":...
</pre>
</pre>
Metrics can also be retrieved in prometheus format
The Spring Boot framework offers general metrics about system, JVM, ...
As an example the metric process.cpu.usage is picked from the list at https://docs.spring.io/spring-boot/docs/2.0.10.BUILD-SNAPSHOT/actuator-api/html/#metrics :
<pre>
# curl "http://ausername:thepass@localhost:8004/dcs/management/metrics/process.cpu.usage"
{"name": "process.cpu.usage","measurements":[{"value": 0.03215434083601286, ...
</pre>
These general metrics can also be retrieved in prometheus format
<pre>
<pre>
# curl "http://ausername:thepass@localhost:8004/dcs/management/prometheus"
# curl "http://ausername:thepass@localhost:8004/dcs/management/prometheus"
Line 643: Line 649:
...
...
</pre>
</pre>
Data specific to ActiveMQ is explained in the [https://activemq.apache.org/jmx ActiveMQ MBeans Reference].


== Further Reading ==
== Further Reading ==

Latest revision as of 16:37, 22 January 2020

OX Documents / OX Documentconverter Monitoring

General

The Open-Xchange JMX offers the ability to fetch runtime information of the Java virtual machine, and about the Open-Xchange Middleware (aka groupware backend) including OX Documents, Documentconverter and Imageconverter.

For general information about the JMX interface see: OX_monitoring_interface

The usage of Jolokia to access monitoring data is described in the article Jolokia

Plugins are available for integration in the munin monitoring framework: OX_munin_scripts

OX Documentconverter

Monitoring data for OX Documentconverter can be obtained after installation of the package open-xchange-munin-scripts-jolokia. It uses Jolokia if available and configured.

The following table shows all available Documentconverter server metrics, provided via the JMX interface:

Documentconverter server metrics
Metrics Name Description Instance / LongTerm Remarks
CacheEntryCount The current count of entries, stored within the persistent cache Instance
CacheFreeVolumeSize The current free size in bytes of the volume, the cache root directory is located on. The size is calculated on a volume block size basis to reflect the correct size of the free volume. Instance
CacheHitRatio The ratio of successful job results that are retrieved from the cache against the total number of processed jobs as simple quotient Instance
CacheOldestEntrySeconds The age of the oldest entry within the cache in seconds Instance From 7.10.3 on
CachePersistentSize The current size in bytes of all persistent entries, stored within the cache directory. The size is calculated on a volume block size basis to reflect the correct summed up size. Instance
ConversionCount The total number of all physical ReaderEngine and PDFTool conversions Instance
ConversionCountPDFTool The total number of all physical PDFTool conversions Instance
ConversionCountReaderEngine The total number of all physical ReaderEngine conversions Instance
ConversionCountWithMultipleRuns The total number of physical ReaderEngine conversions, that needed a second, new ReaderEngine instance Instance
ConversionErrorsGeneral The total number of physical ReaderEngine conversions, that resulted in a general error Instance
ConversionErrorsTimeout The total number of physical ReaderEngine conversions, that resulted in a tiemout error Instance
ConversionErrorsDisposed The total number of physical ReaderEngine conversions, that resulted in a disposed error (ReaderEngine instance vanished during conversion) Instance
ConversionErrorsPassword The total number of physical ReaderEngine conversions, that resulted in a password error Instance
ConversionErrorsNoContent The total number of physical ReaderEngine conversions, that resulted in a no content error (e.g. source document of 0 bytes length) Instance
ConversionErrorsOutOfMemory The total number of physical ReaderEngine conversions, that resulted in an out of memory error Instance
ConversionErrorsPDFTool The total number of physical PDFTool conversions, that resulted in an error Instance
ErrorCacheHitRatio The ratio of job errors that are retrieved from the errror cache against the total number of job errors as simple quotient Instance
JobErrorsTimeout The total number of jobs, that returned with a timeout error Instance
JobErrorsTotal The total number of jobs, that returned with an error Instance
JobsProcessed The total number of jobs that have been processed Instance
MedianConversionTimeMillis_Background The median time in milliseconds, a physical ReaderEngine conversion with background priority lasted, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianConversionTimeMillis_Low The median time in milliseconds, a physical ReaderEngine conversion with low priority lasted, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianConversionTimeMillis_Medium The median time in milliseconds, a physical ReaderEngine conversion with medium priority lasted, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianConversionTimeMillis_High The median time in milliseconds, a physical ReaderEngine conversion with high priority lasted, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianConversionTimeMillis_Instant The median time in milliseconds, a physical ReaderEngine conversion with instant priority lasted, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianConversionTimeMillis_Total The median time in milliseconds, a physical ReaderEngine conversion lasted, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianJobTimeMillis_Background The median time in milliseconds, a job with background priority needed to complete, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianJobTimeMillis_Low The median time in milliseconds, a job with low priority needed to complete, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianJobTimeMillis_Medium The median time in milliseconds, a job with medium priority needed to complete, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianJobTimeMillis_High The median time in milliseconds, a job with high priority needed to complete, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianJobTimeMillis_Instant The median time in milliseconds, a job with instant priority needed to complete, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianJobTimeMillis_Total The median time in milliseconds, a job needed to complete, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianQueueTimeMillis_Background The median time in milliseconds, a background job waited in the queue for processing, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianQueueTimeMillis_Low The median time in milliseconds, a low priority job waited in the queue for processing, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianQueueTimeMillis_Medium The median time in milliseconds, a medium priority job waited in the queue for processing, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianQueueTimeMillis_High The median time in milliseconds, a high priority job waited in the queue for processing, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianQueueTimeMillis_Instant The median time in milliseconds, an instant priority job waited in the queue for processing, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
MedianQueueTimeMillis_Total The median time in milliseconds, a job waited in the queue for processing, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
PeakJobCountInQueue_Background The peak number of background priority jobs in queue, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
PeakJobCountInQueue_Low The peak number of low priority jobs in queue, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
PeakJobCountInQueue_Medium The peak number of medium priority jobs in queue, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
PeakJobCountInQueue_High The peak number of high priority jobs in queue, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
PeakJobCountInQueue_Instant The peak number of instant priority jobs in queue, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
PeakJobCountInQueue_Total The peak number of jobs in queue, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
PeakJobCountInAsyncQueue_Total The peak number of jobs in the queue for asynchronous jobs, measured from the beginning of the last reset/initialization on (default period: 5 minutes) Instance
PendingStatefulJobCount The number of stateful job conversions for that no endConvert has been called up to now Instance From 7.10.3 on
QueueLimitHighReachedCountInstance The accumulated number of times the job queue high limit has been reached for the current DocumentConverter server instance Instance From 7.10.3 on
QueueLimitLowReachedCountInstance The accumulated number of times the job queue low limit has been reached for the current DocumentConverter server instance Instance From 7.10.3 on
QueueLimitHighReachedCountLongTerm The accumulated number of times the job queue high limit has been reached for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term From 7.10.3 on
QueueLimitLowReachedCountLongTerm The accumulated number of times the job queue low limit has been reached all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term From 7.10.3 on
QueueTimeoutCountInstance The accumulated number of times the job queue timeout has been reached for the current DocumentConverter server instance Instance From 7.10.3 on
QueueTimeoutCountLongTerm The accumulated number of times the job queue timeout has been reached for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term From 7.10.3 on
RuntimeSecondsInstance The runtime in seconds for the current DocumentConverter server instance Instance
RuntimeSecondsLongTerm The accumulated runtime in seconds for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
ScheduledJobCountInQueue The current number of all scheduled, synchronous jobs within the queue, waiting to be processed Instance From 7.10.3 on
UserRequestCountInstance_Total The number of all WebService requests, that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestCountInstance_Text The number of text document related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestCountInstance_Spreadsheet The number of spreadsheet document related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestCountInstance_Presentation The number of presentation document related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestCountInstance_Image The number of image related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestCountLongTerm_Total The accumulated number of all WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestCountLongTerm_Text The accumulated number of text document related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestCountLongTerm_Spreadsheet The accumulated number of spreadsheet document related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestCountLongTerm_Presentation The accumulated number of presentation document related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestCountLongTerm_Image The accumulated number of image related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestCountInstanceFailed_Total The number of all WebService requests, that failed with an error and that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestCountInstanceFailed_Text The number of text document related WebService requests, that failed with an error and that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestCountInstanceFailed_Spreadsheet The number of spreadsheet document related WebService requests, that failed with an error and that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestCountInstanceFailed_Presentation The number of presentation document related WebService requests, that failed with an error and that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestCountInstanceFailed_Image The number of image related WebService requests, that failed with an error and that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestCountLongTermFailed_Total The accmulated number of all WebService requests, that failed with an error and that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestCountLongTermFailed_Text The accmulated number of text document related WebService requests, that failed with an error and that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestCountLongTermFailed_Spreadsheet The accmulated number of spreadsheet document related WebService requests, that failed with an error and that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestCountLongTermFailed_Presentation The accmulated number of presentation document related WebService requests, that failed with an error and that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestCountLongTermFailed_Image The accmulated number of image related WebService requests, that failed with an error and that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestSuccessRatioInstance_Total The ratio of successful WebService requests against failed WebService requests, that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestSuccessRatioInstance_Text The ratio of successful text document related WebService requests against failed text document related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestSuccessRatioInstance_Spreadsheet The ratio of successful spreadsheet document related WebService requests against failed spreadsheet document related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestSuccessRatioInstance_Presentation The ratio of successful presentation document related WebService requests against failed presentation document related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestSuccessRatioInstance_Image The ratio of successful image related WebService requests against failed image related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance Instance
UserRequestSuccessRatioLongTerm_Total The accumulated ratio of successful WebService requests against failed WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestSuccessRatioLongTerm_Text The accumulated ratio of successful text document related WebService requests against failed text document related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestSuccessRatioLongTerm_Spreadsheet The accumulated ratio of successful spreadsheet document related WebService requests against failed spreadsheet document related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestSuccessRatioLongTerm_Presentation The accumulated ratio of successful presentation document related WebService requests against failed presentation document related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term
UserRequestSuccessRatioLongTerm_Image The accumulated ratio of successful image related WebService requests against failed image related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance Long term

Here is a list of sample data provided by

# /opt/open-xchange/sbin/showruntimestats -y -p 9998|sed -e 's/com.openexchange.documentconverter:name=DocumentConverterInformation,//g' | sort

CacheEntryCount = 10258
CacheFreeVolumeSize = 2709409792
CacheHitRatio = 0.9642857142857143
CacheOldestEntrySeconds = 1998029
CachePersistentSize = 564215808
ConversionCount = 1
ConversionCountPDFTool = 0
ConversionCountReaderEngine = 1
ConversionCountWithMultipleRuns = 0
ConversionErrorsDisposed = 0
ConversionErrorsGeneral = 0
ConversionErrorsNoContent = 0
ConversionErrorsOutOfMemory = 0
ConversionErrorsPassword = 1
ConversionErrorsPDFTool = 0
ConversionErrorsTimeout = 0
ErrorCacheHitRatio = 0.0
JobErrorsTimeout = 0
JobErrorsTotal = 1
JobsProcessed = 28
MedianConversionTimeMillis_Background = 0
MedianConversionTimeMillis_High = 0
MedianConversionTimeMillis_Instant = 0
MedianConversionTimeMillis_Low = 0
MedianConversionTimeMillis_Medium = 0
MedianConversionTimeMillis_Total = 0
MedianJobTimeMillis_Background = 0
MedianJobTimeMillis_High = 0
MedianJobTimeMillis_Instant = 0
MedianJobTimeMillis_Low = 0
MedianJobTimeMillis_Medium = 0
MedianJobTimeMillis_Total = 0
MedianQueueTimeMillis_Background = 0
MedianQueueTimeMillis_High = 0
MedianQueueTimeMillis_Instant = 0
MedianQueueTimeMillis_Low = 0
MedianQueueTimeMillis_Medium = 0
MedianQueueTimeMillis_Total = 0
PeakJobCountInAsyncQueue_Total = 0
PeakJobCountInQueue_Background = 0
PeakJobCountInQueue_High = 0
PeakJobCountInQueue_Instant = 0
PeakJobCountInQueue_Low = 0
PeakJobCountInQueue_Medium = 0
PeakJobCountInQueue_Total = 0
PendingStatefulJobCount = 0
QueueLimitHighReachedCountInstance = 33
QueueLimitHighReachedCountLongTerm = 66
QueueLimitLowReachedCountInstance = 33
QueueLimitLowReachedCountLongTerm = 66
QueueTimeoutCountInstance = 22
QueueTimeoutCountLongTerm = 44
RuntimeSecondsInstance = 6707
RuntimeSecondsLongTerm = 28607051
ScheduledJobCountInQueue = 0
UserRequestCountInstanceFailed_Image = 0
UserRequestCountInstanceFailed_Presentation = 0
UserRequestCountInstanceFailed_Spreadsheet = 0
UserRequestCountInstanceFailed_Text = 0
UserRequestCountInstanceFailed_Total = 0
UserRequestCountInstance_Image = 0
UserRequestCountInstance_Presentation = 0
UserRequestCountInstance_Spreadsheet = 0
UserRequestCountInstance_Text = 0
UserRequestCountInstance_Total = 0
UserRequestCountLongTermFailed_Image = 0
UserRequestCountLongTermFailed_Presentation = 10694
UserRequestCountLongTermFailed_Spreadsheet = 2598
UserRequestCountLongTermFailed_Text = 6369
UserRequestCountLongTermFailed_Total = 19661
UserRequestCountLongTerm_Image = 0
UserRequestCountLongTerm_Presentation = 121394
UserRequestCountLongTerm_Spreadsheet = 33453
UserRequestCountLongTerm_Text = 130801
UserRequestCountLongTerm_Total = 285648
UserRequestSuccessRatioInstance_Image = 1.0
UserRequestSuccessRatioInstance_Presentation = 1.0
UserRequestSuccessRatioInstance_Spreadsheet = 1.0
UserRequestSuccessRatioInstance_Text = 1.0
UserRequestSuccessRatioInstance_Total = 1.0
UserRequestSuccessRatioLongTerm_Image = 1.0
UserRequestSuccessRatioLongTerm_Presentation = 0.911906684020627
UserRequestSuccessRatioLongTerm_Spreadsheet = 0.9223388036947359
UserRequestSuccessRatioLongTerm_Text = 0.9513077117147423
UserRequestSuccessRatioLongTerm_Total = 0.9311705315633227

OX Documentconverter and munin

The Documentconverter uses a different port to provide monitoring data via Jolokia. The corresponding oxJolokiaURL is has to be configured with an entry in the settings.

[ox_documentconverter*]
env.oxJolokiaUrl http://localhost:8008/monitoring/jolokia

OX Documents

Monitoring data for OX Documents can be obtained after installation of the package open-xchange-documents-monitoring.

Here is a list of sample data provided by

# /opt/open-xchange/sbin/showruntimestats -f
DocumentsSizeMedian_Spreadsheet = 0
DocumentsCreated_Presentation = 2
DocumentsCreated_Spreadsheet = 2
RestoreDocCurrent = 40
RestoreDocRestoredSuccess = 0
RestoreDocRestoredFailure = 0
DocumentsSizeMedian_Text = 0
DocumentsSizeMedian_Presentation = 0
RestoreDocCreated_Total = 40
RestoreDocRemoved_Total = 0
DocumentsSizeMedian_Total = 0
DocumentsCreated_Total = 4
DocumentsCreated_Text = 0
DocumentsClosed_Text_Timeout = 0
DocumentsSaved_Text_Total = 2
DocumentsSaved_Text_Close = 2
DocumentsSaved_Text_100ops = 0
DocumentsSaved_Text_15mins = 0
DocumentsOperations_Text_Incoming = 8
DocumentsOperations_Text_Distributed = 0
DocumentsOpened_Text_Total = 2
DocumentsOpened_Text_OOXML = 1
DocumentsOpened_Text_Binary = 0
DocumentsOpened_Text_ODF = 1
DocumentsClosed_Text_Total = 2
DocumentsOpened_Spreadsheet_Total = 22
DocumentsOpened_Spreadsheet_OOXML = 16
DocumentsOpened_Spreadsheet_Binary = 0
DocumentsOpened_Spreadsheet_ODF = 6
DocumentsClosed_Spreadsheet_Total = 19
DocumentsClosed_Spreadsheet_Timeout = 0
DocumentsSaved_Spreadsheet_Total = 10
DocumentsSaved_Spreadsheet_Close = 10
DocumentsSaved_Spreadsheet_100ops = 0
DocumentsSaved_Spreadsheet_15mins = 0
DocumentsOperations_Spreadsheet_Incoming = 46
DocumentsOperations_Spreadsheet_Distributed = 0

DCS

JMX and access via Jolokia is disabled by default.

For the required settings in dcs.properties see https://documentation.open-xchange.com/components/documents/documents-collaboration/config/7.10.3/

jmx.enabled=true
jmx.username
jmx.password

The default port is 9994. So the JMX connection can be established via

service:jmx:rmi:///jndi/rmi://10.20.28.108:9994/jmxrmi

The DCS offers Spring Boot Actuator functionality for management and monitoring. By default endpoints are not accessible via Jolokia (except info and health). Additional endpoints have to be exposed and user credentials have to be configured.

management.endpoints.web.exposure.include=*
spring.security.user.name=ausername
spring.security.user.password=thepass

For details see https://docs.spring.io/spring-boot/docs/2.0.x/reference/html/production-ready-endpoints.html

Some examples for useful endpoints:

# curl  "http://localhost:8004/dcs/management/health"
{"status":"UP"}

Note that the health endpoint returns additional data if the call is authenticated.

# curl "http://ausername:thepass@localhost:8004/dcs/management/health"
{"status":"UP","details":{"db":{"status":"UP","details":...

The Spring Boot framework offers general metrics about system, JVM, ... As an example the metric process.cpu.usage is picked from the list at https://docs.spring.io/spring-boot/docs/2.0.10.BUILD-SNAPSHOT/actuator-api/html/#metrics :

# curl "http://ausername:thepass@localhost:8004/dcs/management/metrics/process.cpu.usage" 
{"name": "process.cpu.usage","measurements":[{"value": 0.03215434083601286, ...

These general metrics can also be retrieved in prometheus format

# curl "http://ausername:thepass@localhost:8004/dcs/management/prometheus"
# HELP jvm_threads_peak_threads ...
# TYPE jvm_threads_peak_threads gauge
jvm_threads_peak_threads 58.0
...

Data specific to ActiveMQ is explained in the ActiveMQ MBeans Reference.

Further Reading

Further details about accessing JMX data via Jolokia are described in Middleware documentation.

A call derived from this for OX Documentconverter looks like

$ curl http://yourname:yourpassword@localhost:8008/monitoring/jolokia/read/com.openexchange.documentconverter:name=DocumentConverterInformation/JobsProcessed
{"timestamp":1471868314,"status":200,"request":{"mbean":"com.openexchange.documentconverter:name=DocumentConverterInformation","attribute":"JobsProcessed","type":"read"},"value":6616}