AppSuite:DocumentsMonitoring: Difference between revisions
m (link to documentation.ox) |
(Spring Boot Actuator examples) |
||
Line 624: | Line 624: | ||
For details see | For details see | ||
https://docs.spring.io/spring-boot/docs/2.0.x/reference/html/production-ready-endpoints.html | https://docs.spring.io/spring-boot/docs/2.0.x/reference/html/production-ready-endpoints.html | ||
Some examples for useful endpoints: | |||
<pre> | |||
# curl "http://localhost:8004/dcs/management/health" | |||
{"status":"UP"} | |||
</pre> | |||
Note that the health endpoint returns additional data if the call is authenticated. | |||
<pre> | |||
# curl "http://ausername:thepass@localhost:8004/dcs/management/health" | |||
{"status":"UP","details":{"db":{"status":"UP","details":... | |||
</pre> | |||
== Further Reading == | == Further Reading == |
Revision as of 19:13, 21 January 2020
OX Documents / OX Documentconverter Monitoring
General
The Open-Xchange JMX offers the ability to fetch runtime information of the Java virtual machine, and about the Open-Xchange Middleware (aka groupware backend) including OX Documents, Documentconverter and Imageconverter.
For general information about the JMX interface see: OX_monitoring_interface
The usage of Jolokia to access monitoring data is described in the article Jolokia
Plugins are available for integration in the munin monitoring framework: OX_munin_scripts
OX Documentconverter
Monitoring data for OX Documentconverter can be obtained after installation of the package open-xchange-munin-scripts-jolokia. It uses Jolokia if available and configured.
The following table shows all available Documentconverter server metrics, provided via the JMX interface:
Documentconverter server metrics | |||||
---|---|---|---|---|---|
Metrics Name | Description | Instance / LongTerm | Remarks | ||
CacheEntryCount | The current count of entries, stored within the persistent cache | Instance | |||
CacheFreeVolumeSize | The current free size in bytes of the volume, the cache root directory is located on. The size is calculated on a volume block size basis to reflect the correct size of the free volume. | Instance | |||
CacheHitRatio | The ratio of successful job results that are retrieved from the cache against the total number of processed jobs as simple quotient | Instance | |||
CacheOldestEntrySeconds | The age of the oldest entry within the cache in seconds | Instance | From 7.10.3 on | ||
CachePersistentSize | The current size in bytes of all persistent entries, stored within the cache directory. The size is calculated on a volume block size basis to reflect the correct summed up size. | Instance | |||
ConversionCount | The total number of all physical ReaderEngine and PDFTool conversions | Instance | |||
ConversionCountPDFTool | The total number of all physical PDFTool conversions | Instance | |||
ConversionCountReaderEngine | The total number of all physical ReaderEngine conversions | Instance | |||
ConversionCountWithMultipleRuns | The total number of physical ReaderEngine conversions, that needed a second, new ReaderEngine instance | Instance | |||
ConversionErrorsGeneral | The total number of physical ReaderEngine conversions, that resulted in a general error | Instance | |||
ConversionErrorsTimeout | The total number of physical ReaderEngine conversions, that resulted in a tiemout error | Instance | |||
ConversionErrorsDisposed | The total number of physical ReaderEngine conversions, that resulted in a disposed error (ReaderEngine instance vanished during conversion) | Instance | |||
ConversionErrorsPassword | The total number of physical ReaderEngine conversions, that resulted in a password error | Instance | |||
ConversionErrorsNoContent | The total number of physical ReaderEngine conversions, that resulted in a no content error (e.g. source document of 0 bytes length) | Instance | |||
ConversionErrorsOutOfMemory | The total number of physical ReaderEngine conversions, that resulted in an out of memory error | Instance | |||
ConversionErrorsPDFTool | The total number of physical PDFTool conversions, that resulted in an error | Instance | |||
ErrorCacheHitRatio | The ratio of job errors that are retrieved from the errror cache against the total number of job errors as simple quotient | Instance | |||
JobErrorsTimeout | The total number of jobs, that returned with a timeout error | Instance | |||
JobErrorsTotal | The total number of jobs, that returned with an error | Instance | |||
JobsProcessed | The total number of jobs that have been processed | Instance | |||
MedianConversionTimeMillis_Background | The median time in milliseconds, a physical ReaderEngine conversion with background priority lasted, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianConversionTimeMillis_Low | The median time in milliseconds, a physical ReaderEngine conversion with low priority lasted, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianConversionTimeMillis_Medium | The median time in milliseconds, a physical ReaderEngine conversion with medium priority lasted, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianConversionTimeMillis_High | The median time in milliseconds, a physical ReaderEngine conversion with high priority lasted, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianConversionTimeMillis_Instant | The median time in milliseconds, a physical ReaderEngine conversion with instant priority lasted, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianConversionTimeMillis_Total | The median time in milliseconds, a physical ReaderEngine conversion lasted, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianJobTimeMillis_Background | The median time in milliseconds, a job with background priority needed to complete, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianJobTimeMillis_Low | The median time in milliseconds, a job with low priority needed to complete, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianJobTimeMillis_Medium | The median time in milliseconds, a job with medium priority needed to complete, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianJobTimeMillis_High | The median time in milliseconds, a job with high priority needed to complete, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianJobTimeMillis_Instant | The median time in milliseconds, a job with instant priority needed to complete, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianJobTimeMillis_Total | The median time in milliseconds, a job needed to complete, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianQueueTimeMillis_Background | The median time in milliseconds, a background job waited in the queue for processing, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianQueueTimeMillis_Low | The median time in milliseconds, a low priority job waited in the queue for processing, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianQueueTimeMillis_Medium | The median time in milliseconds, a medium priority job waited in the queue for processing, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianQueueTimeMillis_High | The median time in milliseconds, a high priority job waited in the queue for processing, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianQueueTimeMillis_Instant | The median time in milliseconds, an instant priority job waited in the queue for processing, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
MedianQueueTimeMillis_Total | The median time in milliseconds, a job waited in the queue for processing, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
PeakJobCountInQueue_Background | The peak number of background priority jobs in queue, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
PeakJobCountInQueue_Low | The peak number of low priority jobs in queue, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
PeakJobCountInQueue_Medium | The peak number of medium priority jobs in queue, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
PeakJobCountInQueue_High | The peak number of high priority jobs in queue, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
PeakJobCountInQueue_Instant | The peak number of instant priority jobs in queue, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
PeakJobCountInQueue_Total | The peak number of jobs in queue, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
PeakJobCountInAsyncQueue_Total | The peak number of jobs in the queue for asynchronous jobs, measured from the beginning of the last reset/initialization on (default period: 5 minutes) | Instance | |||
PendingStatefulJobCount | The number of stateful job conversions for that no endConvert has been called up to now | Instance | From 7.10.3 on | ||
QueueLimitHighReachedCountInstance | The accumulated number of times the job queue high limit has been reached for the current DocumentConverter server instance | Instance | From 7.10.3 on | ||
QueueLimitLowReachedCountInstance | The accumulated number of times the job queue low limit has been reached for the current DocumentConverter server instance | Instance | From 7.10.3 on | ||
QueueLimitHighReachedCountLongTerm | The accumulated number of times the job queue high limit has been reached for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | From 7.10.3 on | ||
QueueLimitLowReachedCountLongTerm | The accumulated number of times the job queue low limit has been reached all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | From 7.10.3 on | ||
QueueTimeoutCountInstance | The accumulated number of times the job queue timeout has been reached for the current DocumentConverter server instance | Instance | From 7.10.3 on | ||
QueueTimeoutCountLongTerm | The accumulated number of times the job queue timeout has been reached for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | From 7.10.3 on | ||
RuntimeSecondsInstance | The runtime in seconds for the current DocumentConverter server instance | Instance | |||
RuntimeSecondsLongTerm | The accumulated runtime in seconds for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
ScheduledJobCountInQueue | The current number of all scheduled, synchronous jobs within the queue, waiting to be processed | Instance | From 7.10.3 on | ||
UserRequestCountInstance_Total | The number of all WebService requests, that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestCountInstance_Text | The number of text document related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestCountInstance_Spreadsheet | The number of spreadsheet document related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestCountInstance_Presentation | The number of presentation document related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestCountInstance_Image | The number of image related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestCountLongTerm_Total | The accumulated number of all WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestCountLongTerm_Text | The accumulated number of text document related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestCountLongTerm_Spreadsheet | The accumulated number of spreadsheet document related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestCountLongTerm_Presentation | The accumulated number of presentation document related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestCountLongTerm_Image | The accumulated number of image related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestCountInstanceFailed_Total | The number of all WebService requests, that failed with an error and that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestCountInstanceFailed_Text | The number of text document related WebService requests, that failed with an error and that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestCountInstanceFailed_Spreadsheet | The number of spreadsheet document related WebService requests, that failed with an error and that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestCountInstanceFailed_Presentation | The number of presentation document related WebService requests, that failed with an error and that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestCountInstanceFailed_Image | The number of image related WebService requests, that failed with an error and that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestCountLongTermFailed_Total | The accmulated number of all WebService requests, that failed with an error and that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestCountLongTermFailed_Text | The accmulated number of text document related WebService requests, that failed with an error and that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestCountLongTermFailed_Spreadsheet | The accmulated number of spreadsheet document related WebService requests, that failed with an error and that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestCountLongTermFailed_Presentation | The accmulated number of presentation document related WebService requests, that failed with an error and that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestCountLongTermFailed_Image | The accmulated number of image related WebService requests, that failed with an error and that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestSuccessRatioInstance_Total | The ratio of successful WebService requests against failed WebService requests, that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestSuccessRatioInstance_Text | The ratio of successful text document related WebService requests against failed text document related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestSuccessRatioInstance_Spreadsheet | The ratio of successful spreadsheet document related WebService requests against failed spreadsheet document related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestSuccessRatioInstance_Presentation | The ratio of successful presentation document related WebService requests against failed presentation document related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestSuccessRatioInstance_Image | The ratio of successful image related WebService requests against failed image related WebService requests, that have the user flag set to true, for the current DocumentConverter server instance | Instance | |||
UserRequestSuccessRatioLongTerm_Total | The accumulated ratio of successful WebService requests against failed WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestSuccessRatioLongTerm_Text | The accumulated ratio of successful text document related WebService requests against failed text document related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestSuccessRatioLongTerm_Spreadsheet | The accumulated ratio of successful spreadsheet document related WebService requests against failed spreadsheet document related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestSuccessRatioLongTerm_Presentation | The accumulated ratio of successful presentation document related WebService requests against failed presentation document related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term | |||
UserRequestSuccessRatioLongTerm_Image | The accumulated ratio of successful image related WebService requests against failed image related WebService requests, that have the user flag set to true, for all previously started DocumentConverter server instances and the current DocumentConverter server instance | Long term |
Here is a list of sample data provided by
# /opt/open-xchange/sbin/showruntimestats -y -p 9998|sed -e 's/com.openexchange.documentconverter:name=DocumentConverterInformation,//g' | sort CacheEntryCount = 10258 CacheFreeVolumeSize = 2709409792 CacheHitRatio = 0.9642857142857143 CacheOldestEntrySeconds = 1998029 CachePersistentSize = 564215808 ConversionCount = 1 ConversionCountPDFTool = 0 ConversionCountReaderEngine = 1 ConversionCountWithMultipleRuns = 0 ConversionErrorsDisposed = 0 ConversionErrorsGeneral = 0 ConversionErrorsNoContent = 0 ConversionErrorsOutOfMemory = 0 ConversionErrorsPassword = 1 ConversionErrorsPDFTool = 0 ConversionErrorsTimeout = 0 ErrorCacheHitRatio = 0.0 JobErrorsTimeout = 0 JobErrorsTotal = 1 JobsProcessed = 28 MedianConversionTimeMillis_Background = 0 MedianConversionTimeMillis_High = 0 MedianConversionTimeMillis_Instant = 0 MedianConversionTimeMillis_Low = 0 MedianConversionTimeMillis_Medium = 0 MedianConversionTimeMillis_Total = 0 MedianJobTimeMillis_Background = 0 MedianJobTimeMillis_High = 0 MedianJobTimeMillis_Instant = 0 MedianJobTimeMillis_Low = 0 MedianJobTimeMillis_Medium = 0 MedianJobTimeMillis_Total = 0 MedianQueueTimeMillis_Background = 0 MedianQueueTimeMillis_High = 0 MedianQueueTimeMillis_Instant = 0 MedianQueueTimeMillis_Low = 0 MedianQueueTimeMillis_Medium = 0 MedianQueueTimeMillis_Total = 0 PeakJobCountInAsyncQueue_Total = 0 PeakJobCountInQueue_Background = 0 PeakJobCountInQueue_High = 0 PeakJobCountInQueue_Instant = 0 PeakJobCountInQueue_Low = 0 PeakJobCountInQueue_Medium = 0 PeakJobCountInQueue_Total = 0 PendingStatefulJobCount = 0 QueueLimitHighReachedCountInstance = 33 QueueLimitHighReachedCountLongTerm = 66 QueueLimitLowReachedCountInstance = 33 QueueLimitLowReachedCountLongTerm = 66 QueueTimeoutCountInstance = 22 QueueTimeoutCountLongTerm = 44 RuntimeSecondsInstance = 6707 RuntimeSecondsLongTerm = 28607051 ScheduledJobCountInQueue = 0 UserRequestCountInstanceFailed_Image = 0 UserRequestCountInstanceFailed_Presentation = 0 UserRequestCountInstanceFailed_Spreadsheet = 0 UserRequestCountInstanceFailed_Text = 0 UserRequestCountInstanceFailed_Total = 0 UserRequestCountInstance_Image = 0 UserRequestCountInstance_Presentation = 0 UserRequestCountInstance_Spreadsheet = 0 UserRequestCountInstance_Text = 0 UserRequestCountInstance_Total = 0 UserRequestCountLongTermFailed_Image = 0 UserRequestCountLongTermFailed_Presentation = 10694 UserRequestCountLongTermFailed_Spreadsheet = 2598 UserRequestCountLongTermFailed_Text = 6369 UserRequestCountLongTermFailed_Total = 19661 UserRequestCountLongTerm_Image = 0 UserRequestCountLongTerm_Presentation = 121394 UserRequestCountLongTerm_Spreadsheet = 33453 UserRequestCountLongTerm_Text = 130801 UserRequestCountLongTerm_Total = 285648 UserRequestSuccessRatioInstance_Image = 1.0 UserRequestSuccessRatioInstance_Presentation = 1.0 UserRequestSuccessRatioInstance_Spreadsheet = 1.0 UserRequestSuccessRatioInstance_Text = 1.0 UserRequestSuccessRatioInstance_Total = 1.0 UserRequestSuccessRatioLongTerm_Image = 1.0 UserRequestSuccessRatioLongTerm_Presentation = 0.911906684020627 UserRequestSuccessRatioLongTerm_Spreadsheet = 0.9223388036947359 UserRequestSuccessRatioLongTerm_Text = 0.9513077117147423 UserRequestSuccessRatioLongTerm_Total = 0.9311705315633227
OX Documentconverter and munin
The Documentconverter uses a different port to provide monitoring data via Jolokia. The corresponding oxJolokiaURL is has to be configured with an entry in the settings.
[ox_documentconverter*] env.oxJolokiaUrl http://localhost:8008/monitoring/jolokia
OX Documents
Monitoring data for OX Documents can be obtained after installation of the package open-xchange-documents-monitoring.
Here is a list of sample data provided by
# /opt/open-xchange/sbin/showruntimestats -f DocumentsSizeMedian_Spreadsheet = 0 DocumentsCreated_Presentation = 2 DocumentsCreated_Spreadsheet = 2 RestoreDocCurrent = 40 RestoreDocRestoredSuccess = 0 RestoreDocRestoredFailure = 0 DocumentsSizeMedian_Text = 0 DocumentsSizeMedian_Presentation = 0 RestoreDocCreated_Total = 40 RestoreDocRemoved_Total = 0 DocumentsSizeMedian_Total = 0 DocumentsCreated_Total = 4 DocumentsCreated_Text = 0 DocumentsClosed_Text_Timeout = 0 DocumentsSaved_Text_Total = 2 DocumentsSaved_Text_Close = 2 DocumentsSaved_Text_100ops = 0 DocumentsSaved_Text_15mins = 0 DocumentsOperations_Text_Incoming = 8 DocumentsOperations_Text_Distributed = 0 DocumentsOpened_Text_Total = 2 DocumentsOpened_Text_OOXML = 1 DocumentsOpened_Text_Binary = 0 DocumentsOpened_Text_ODF = 1 DocumentsClosed_Text_Total = 2 DocumentsOpened_Spreadsheet_Total = 22 DocumentsOpened_Spreadsheet_OOXML = 16 DocumentsOpened_Spreadsheet_Binary = 0 DocumentsOpened_Spreadsheet_ODF = 6 DocumentsClosed_Spreadsheet_Total = 19 DocumentsClosed_Spreadsheet_Timeout = 0 DocumentsSaved_Spreadsheet_Total = 10 DocumentsSaved_Spreadsheet_Close = 10 DocumentsSaved_Spreadsheet_100ops = 0 DocumentsSaved_Spreadsheet_15mins = 0 DocumentsOperations_Spreadsheet_Incoming = 46 DocumentsOperations_Spreadsheet_Distributed = 0
DCS
JMX and access via Jolokia is disabled by default.
For the required settings in dcs.properties see https://documentation.open-xchange.com/components/documents/documents-collaboration/config/7.10.3/
jmx.enabled=true jmx.username jmx.password
The default port is 9994. So the JMX connection can be established via
service:jmx:rmi:///jndi/rmi://10.20.28.108:9994/jmxrmi
The DCS offers Spring Boot Actuator functionality for management and monitoring. By default endpoints are not accessible via Jolokia (except info and health). Additional endpoints have to be exposed and user credentials have to be configured.
management.endpoints.web.exposure.include=* spring.security.user.name=ausername spring.security.user.password=thepass
For details see https://docs.spring.io/spring-boot/docs/2.0.x/reference/html/production-ready-endpoints.html
Some examples for useful endpoints:
# curl "http://localhost:8004/dcs/management/health" {"status":"UP"}
Note that the health endpoint returns additional data if the call is authenticated.
# curl "http://ausername:thepass@localhost:8004/dcs/management/health" {"status":"UP","details":{"db":{"status":"UP","details":...
Further Reading
Further details about accessing JMX data via Jolokia are described in Middleware documentation.
A call derived from this for OX Documentconverter looks like
$ curl http://yourname:yourpassword@localhost:8008/monitoring/jolokia/read/com.openexchange.documentconverter:name=DocumentConverterInformation/JobsProcessed {"timestamp":1471868314,"status":200,"request":{"mbean":"com.openexchange.documentconverter:name=DocumentConverterInformation","attribute":"JobsProcessed","type":"read"},"value":6616}