LogRhythm Diagnostic Module User Guide
The LogRhythm Diagnostics module is provided as part of the LogRhythm Knowledge Base and includes content intended to monitor the health of the LogRhythm deployment and generate alarms when key health-impacting events occur. The module contains tails, reports, and investigations to monitor all diagnostic events, as well as alarms triggered by specific conditions.
The LogRhythm Diagnostics Module replaces content currently available in the QsEMP module and will be automatically synchronized on all deployments. Existing rules have been modified for accuracy, updated with new components where necessary, and have undergone settings changes to reduce alarm volume. With the exception of LogRhythm Component Critical Condition, which is suppressed by default for one hour, the default suppression for all alarms has been updated to two hours.
Module Contents
Alarm Rules
Rule Name | Description | ID |
---|---|---|
LogRhythm Mediator Database Capacity Error | Alarms on the occurrence of the Mediator Database reaching 90% capacity. The Mediator Server inserting log data into the affected Mediator database will cease accepting new log messages from connected agents and will force agents to disconnect. | 96 |
LogRhythm Mediator Database Capacity Warning | Alarms on the occurrence of the Mediator Database reaching 80% capacity. At 90% capacity the Mediator Server inserting data into the affected Mediator database will cease accepting new log messages. | 97 |
LogRhythm Agent Heartbeat Missed | Alarms on the occurrence of a LogRhythm Agent Heartbeat Missed event which could indicate a LogRhythm Agent going down. | 98 |
LogRhythm Component Critical Condition | Alarms on the occurrence of any critical LogRhythm component event which indicates the failure of a LogRhythm component. | 99 |
LogRhythm Component Successive Errors | Alarms on successive occurrences of critical and error LogRhythm component events which indicate the failure of a LogRhythm component. | 100 |
LogRhythm Component Excessive Warnings | Alarms on excessive occurrences of critical, error or warning LogRhythm component events which could indicate pending failures of the LogRhythm solution. | 101 |
LogRhythm Mediator Heartbeat Missed | Alarms on the occurrence of a LogRhythm Mediator Heartbeat Missed event which could indicate that a log manager has gone down. | 102 |
LogRhythm MPE Rule Disabled | Alarms on the occurrence of a LogRhythm MPE Rule Disabled event. | 103 |
LogRhythm Silent Log Source Error | Alarms on a LogRhythm Silent Log Source Error event which could indicate a log source that has gone silent. | 104 |
LogRhythm Database Maintenance Failure | Alarms on a LogRhythm Database Maintenance job failing. | 210 |
LogRhythm Failed To Submit Batch Job To DB | Alarms on Mediator error "Failed to submit batch job to the database" | 212 |
LogRhythm Excessive Unprocessed Logs Spooled | A high number of logs have been spooled to disk. | 230 |
LogRhythm Excessive Processed Logs Spooled to Disk | The Log Insert Manager has spooled a high number of logs to disk. | 231 |
LogRhythm Excessive Events Spooled to Disk | The Event Insert Manager has spooled a high number of logs to disk. | 232 |
Perfmon Counter Reached Threshold Limit | Alarm for performance counter alerting on disk exhaustion | 233 |
LogRhythm GLPR Error | Alarms on the occurrence of a LogRhythm GLPR processing or preparation error. | 408 |
LogRhythm AI Engine Heartbeat Missed | A heartbeat message from the LogRhythm AI Engine service was not received in the allotted time. | 676 |
LogRhythm AI Comm Manager Heartbeat Missed | A heartbeat message from the LogRhythm AI Engine Communication Manager service was not received in the allotted time. | 677 |
LogRhythm CMDB Database Warning | Alarms on the occurrence of the Case Management Database reaching 90% capacity. | 947 |
LogRhythm CMDB Stats Warning | Alarms on the occurrence when the LogRhythm Job Manager is unable to retrieve the Case Management stats. | 948 |
LogRhythm CMDB Database Error | Alarms when the Case Management Database has utilized more than 90% of its capacity. | 949 |
LogRhythm Agent Cannot Update | Alarms on the LogRhythm Diagnostic Event ID 7012 - The System Monitor Agent Cannot Update Itself. | 1002 |
LogRhythm Agent Needs Reboot | Alarms on the LogRhythm Diagnostic Event ID 7014 - The System Monitor Agent Has Been Updated But Requires A Reboot. | 1003 |
LogRhythm Network Monitor Heartbeat Missed | Alarms on the occurrence of a Network Monitor Heartbeat Missed event which could indicate that a network monitor has gone down. | 1084 |
LogRhythm Data Indexer Stopped | One or more data indexer services has stopped. | 1093 |
LogRhythm Data Indexer Configuration Fail | An attempt to change the LogRhythm Data Indexer configuration has failed. | 1094 |
LogRhythm Data Indexer Suspend | The LogRhythm Data Indexer reliable messaging has gone into suspend. | 1095 |
LogRhythm Data Indexer EMDB Sync Fail | The synchronization service failed to replicate critical EMDB tables. | 1096 |
LogRhythm Data Indexer Disk Limit Exceeded | The LogRhythm Data Indexer has exceeded its drive space threshold. | 1097 |
LogRhythm Data Indexer Max Index Exceeded | The LogRhythm Data Indexer has exceeded its TTL. | 1098 |
LogRhythm Data Indexer List Not Found | A query attempted to access a list which is not available on the DX cluster. | 1099 |
LogRhythm Data Indexer Repo Not Found | A query attempted to access a log repository which is not available on the DX cluster. | 1100 |
LogRhythm Knowledge Base Update Error | Alarms if the LogRhythm Knowledge Base fails to automatically download or sync. | 1102 |
LogRhythm Mediator Recycling – Hung MPE Threads | The LogRhythm Mediator has recycled due to hung MPE threads. | 1139 |
LogRhythm Scheduled Report Failure | The LogRhythm Job Manager has encountered an error when attempting to prepare, run, or export a scheduled report package. | 1140 |
LogRhythm AD Sync Failure | The LogRhythm Platform Manager has failed when attempting to sync Active Directory. | 1141 |
Log Source Collection Error | An error has occurred in collecting log sources. | 1424 |
LogRhythm Agent Max Memory Error | The LogRhythm Agent maximum memory limit has generated an error. | 1425 |
LogRhythm Agent Max Memory Warning | The LogRhythm Agent has reached the maximum memory setting. | 1426 |
Data Indexer Cluster Health Downgrade | The Data Indexer elastic search health is downgraded. | 1427 |
Data Indexer Cluster Health Recovery | The Data Indexer elastic search health has recovered. | 1428 |
Reports
Report Name | Report Description | Report ID |
---|---|---|
LogRhythm Diagnostic Events | Provides a detailed account of critical and error conditions experienced by LogRhythm components. | 431 |
Investigations
Investigation Name | Investigation Description | Investigation ID |
---|---|---|
LogRhythm Diagnostic Events | This investigation is used to bring back all diagnostics events from any LogRhythm Component (Agent, AI Engine, ARM, Mediator, etc.). | 12 |
Tails
Tail Name | Tail Description | Tail ID |
---|---|---|
LogRhythm Diagnostic Events | This tail returns all diagnostic events from any LogRhythm component (System Monitor Agent, AI Engine, ARM, Mediator, etc.). | 1 |
Troubleshooting Guidance
This section provides information about steps you can take to further analyze specific alarms or how to gather additional information to provide to LogRhythm Customer Support.
ID: Alarm | Potential Remediation Steps |
---|---|
96: LogRhythm Mediator Database Capacity Error |
Directs immediate attention to the LMDB and indicates the system has now gone into suspend mode. This is caused by oversubscription of the LMDB. This alarm does not apply to the Data Processor.
|
97: LogRhythm Mediator Database Capacity Warning |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support.
Provides early warning to administrators that action may be necessary to maintain the LMDB and prevent the system from going into suspend mode. This is caused by oversubscription of the LMDB. This alarm does not apply to the Data Processor.
|
98: LogRhythm Agent Heartbeat Missed |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support. |
99: LogRhythm Component Critical Condition |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support.
Identifies and detects critical problems at an early stage on any LogRhythm component. It will most likely require analysis to verify the scope, validity, and priority if a source issue identified.
|
100: LogRhythm Component Successive Errors |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support.
Identifies and detects potential problems at an early stage on any LogRhythm component. It will most likely require analysis to verify the scope, validity, and priority if a source issue identified.
|
101: LogRhythm Component Excessive Warnings |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support.
Identifies and detects potential problems at an early stage on any LogRhythm component. It will most likely require analysis to verify the scope, validity, and priority if a source issue identified.
|
102: LogRhythm Mediator Heartbeat Missed |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support. |
103: LogRhythm MPE Rule Disabled |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support. The MPE service now detects when individual rules continue to generate processing warnings and may disable rules repeatedly raising warnings if hung processes jeopardize the health of the system. This capability increases the reliability of each Log Manager by allowing the MPE to more accurately identify and gracefully handle parsing rules that risk the health of the overall system. |
104: LogRhythm Silent Log Source Error |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support.
While Silent log source alarms can be extremely valuable, and can be set individually per log source, environmental factors, like a log source that is not very chatty, can cause flooding of this alarm. As this is tuned per log source it can have a high administrative cost. When tuned, however, the value of these alarms can be extremely high and produce valuable insight into each log source’s normal behavior.
|
210: LogRhythm Database Maintenance Failure |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support. Each SQL Database Maintenance job comprises individual steps that perform a specific maintenance function. LogRhythm has several maintenance jobs designed to perform routine functions that age data from the databases and rebuild indexes to maintain efficient search functions. If the maintenance jobs do not run, it will have an impact on your system and could create suspense conditions and fill the databases to capacity. The database maintenance jobs are implemented as SQL Server Agent jobs. There are three jobs that are in place on any LogRhythm 6.x database server:
One additional job that appears on Platform Managers only, LogRhythm Backup, is called by the LogRhythm Sunday Maintenance Job. |
212: LogRhythm Failed To Submit Batch Job To DB |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support.
Processed logs, coupled with their respective instructions for inserting, are batched together into the respective destination database.
|
230: LogRhythm Excessive Unprocessed Logs Spooled to Disk |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support. Indicates that processing is unable to keep up with the rate at which logs are being received. This could be due to improper processing of logs, oversubscription, high resource utilization, or overall service health. This rule can be tuned to exclude lower-volume diagnostic events (e.g., Unprocessed Log Spooled Count Exceeds 1 Million) on high-volume deployments. Component settings can be tuned to help a system catch up temporarily. However, each of these settings should always be set back to their original best practice values, unless there is a specific need to keep a new setting or unless otherwise directed. Manipulation of these settings can cause a waterfall effect. For example, if you increase the resources dedicated to insertion, you would leave less resources for processing incoming log messages. |
231: LogRhythm Excessive Processed Logs Spooled to Disk |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support.
Spooling logs to disk is an expected condition that happens under periods of peak load. Excessive spooling, however, can result in disk starvation and the Mediator going into a suspend state. This rule can be tuned to exclude lower-volume diagnostic events (e.g., Unprocessed Log Spooled Count Exceeds 1 Million) on high-volume deployments.
|
232: LogRhythm Excessive Events Spooled to Disk |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support.
Spooling logs to disk is an expected condition that happens under periods of peak load. Excessive spooling, however, can result in disk starvation and the Mediator going into a suspend state. This rule can be tuned to exclude lower-volume diagnostic events (e.g., InsertMgr Event Spooled Count Exceeds 1 Million) on high-volume deployments.
|
233: Perfmon Counter Reached Threshold Limit | Remediation varies based on rule configuration.
This rule is not typically enabled by default, but specific Performance Counters are added to the alarm criteria to assist with troubleshooting or to detect specific trouble conditions.
|
408: LogRhythm GLPR Error |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support. The following error types may trigger this alarm:
|
676: LogRhythm AI Engine Heartbeat Missed |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support. |
677: LogRhythm AI Comm Manager Heartbeat Missed |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support.
The threshold at which a LogRhythm System Monitor will trigger this alarm can be configured in the System Monitor properties.
|
947: LogRhythm CMDB Database Warning | Collect SQL error logs, note database and disk sizes, and then contact LogRhythm Customer Support. |
948: LogRhythm CMDB Stats Warning | Collect SQL error logs, note database and disk sizes, and then contact LogRhythm Customer Support. |
949: LogRhythm CMDB Database Error | Collect SQL error logs, note database and disk sizes, and then contact LogRhythm Customer Support. |
1002: LogRhythm Agent Cannot Update |
|
1003: LogRhythm Agent Needs Reboot | Restart the LogRhythm System Monitor Agent
This alarm indicates that the noted System Monitor Agent service requires a restart. You should restart the Agent during off-peak hours and within an allowed change control window.
|
1084: LogRhythm Network Monitor Heartbeat Missed |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support. |
1093: LogRhythm Data Indexer Stopped | Attempt to start the Indexer services using the start script:
|
1094: LogRhythm Data Indexer Configuration Fail | Contact LogRhythm Customer Support. |
1095: LogRhythm Data Indexer Suspend | Contact LogRhythm Customer Support. |
1096: LogRhythm Data Indexer EMDB Sync Fail | Contact LogRhythm Customer Support. |
1097: LogRhythm Data Indexer Disk Limit Exceeded | Contact LogRhythm Customer Support. |
1098: LogRhythm Data Indexer Max Index Exceeded | Contact LogRhythm Customer Support. |
1099: LogRhythm Data Indexer List Not Found | Contact LogRhythm Customer Support. |
1100: LogRhythm Data Indexer Repo Not Found | Contact LogRhythm Customer Support. |
1102: LogRhythm Knowledge Base Update Error |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support. |
1139: LogRhythm Mediator Recycling – Hung MPE Threads |
If the steps above do not provide a solution or if you require assistance, contact LogRhythm Customer Support.
The MPE service now detects when individual rules continue to generate processing warnings and may disable rules repeatedly raising warnings if hung processes jeopardize the health of the system.
This capability increases the reliability of each Log Manager by allowing the MPE to more accurately identify and gracefully handle parsing rules that risk the health of the overall system. |
1140: LogRhythm Scheduled Report Failure | Collect the lrjobmgr.log file, and then contact LogRhythm Customer Support. |
1141: LogRhythm AD Sync Failure |
|
1424: Log Source Collection Error |
Contact LogRhythm Customer Support or visit the Community for community driven questions and answers. |
1425: LogRhythm Agent Max Memory Error |
Contact LogRhythm Customer Support or visit the Community for community driven questions and answers. |
1426: LogRhythm Agent Max Memory Warning |
Contact LogRhythm Customer Support or visit the Community for community driven questions and answers. |
1427: Data Indexer Cluster Health Downgrade | Grab the results of "_cat/indices?v" command from a Web browser and provide the results to LogRhythm Customer Support. |
1428: Data Indexer Cluster Health Recovery | Notification that the Data Indexer Cluster is recovering. You may need to still engage support if the Data Indexer Cluster Health Recovery did not return to "Green." |
Appendix: Summary of Changes
This section summarizes the changes to existing alarm rules and new alarm rules that have been added in the LogRhythm Diagnostics Module.
June 2016
Renamed Alarm Rules
Old Rule Name | New Rule Name | ID |
---|---|---|
QsEMP : Mediator Database Capacity Error | LogRhythm Mediator Database Capacity Error | 96 |
QsEMP : Mediator Database Capacity Warning | LogRhythm Mediator Database Capacity Warning | 97 |
QsEMP : LogRhythm Agent Heartbeat Missed | LogRhythm Agent Heartbeat Missed | 98 |
QsEMP : LogRhythm Component Critical Condition | LogRhythm Component Critical Condition | 99 |
QsEMP : LogRhythm Component Successive Errors | LogRhythm Component Successive Errors | 100 |
QsEMP : LogRhythm Component Excessive Warnings | LogRhythm Component Excessive Warnings | 101 |
QsEMP : LogRhythm Mediator Heartbeat Missed | LogRhythm Mediator Heartbeat Missed | 102 |
QsEMP : LogRhythm MPE Rule Disabled | LogRhythm MPE Rule Disabled | 103 |
QsEMP : LogRhythm Silent Log Source Error | LogRhythm Silent Log Source Error | 104 |
QsEMP : LogRhythm Database Maintenance Failure | LogRhythm Database Maintenance Failure | 210 |
QsEMP : LogRhythm Failed To Submit Batch Job To DB | LogRhythm Failed To Submit Batch Job To DB | 212 |
QsEMP : Excessive Unprocessed Logs Spooled to Disk | LogRhythm Excessive Unprocessed Logs Spooled to Disk | 230 |
QsEMP : Excessive Processed Logs Spooled to Disk | LogRhythm Excessive Processed Logs Spooled to Disk | 231 |
QsEMP : Excessive Events Spooled to Disk | LogRhythm Excessive Events Spooled to Disk | 232 |
QsEMP : Perfmon Counter Reached Threshold Limit | Perfmon Counter Reached Threshold Limit | 233 |
QsEMP : LogRhythm GLPR Error | LogRhythm GLPR Error | 408 |
QsEMP : LogRhythm AI Engine Heartbeat Missed | LogRhythm AI Engine Heartbeat Missed | 676 |
QsEMP : LogRhythm AI Comm Manager Heartbeat Missed | LogRhythm AI Comm Manager Heartbeat Missed | 677 |
QsEMP : LogRhythm CMDB Database Warning | LogRhythm CMDB Database Warning | 947 |
QsEMP : LogRhythm CMDB Stats Warning | LogRhythm CMDB Stats Warning | 948 |
QsEMP : LogRhythm CMDB Database Error | LogRhythm CMDB Database Error | 949 |
QsEMP : LogRhythm Agent Cannot Update | LogRhythm Agent Cannot Update | 1002 |
QsEMP : LogRhythm Agent Needs Reboot | LogRhythm Agent Needs Reboot | 1003 |
QsEMP : LogRhythm Network Monitor Heartbeat Missed | LogRhythm Network Monitor Heartbeat Missed | 1084 |
QsEMP : LogRhythm Data Indexer Stopped | LogRhythm Data Indexer Stopped | 1093 |
QsEMP : LogRhythm Data Indexer Configuration Fail | LogRhythm Data Indexer Configuration Fail | 1094 |
QsEMP : LogRhythm Data Indexer Suspend | LogRhythm Data Indexer Suspend | 1095 |
QsEMP : LogRhythm Data Indexer EMDB Sync Fail | LogRhythm Data Indexer EMDB Sync Fail | 1096 |
QsEMP : LogRhythm Data Indexer Disk Limit Exceeded | LogRhythm Data Indexer Disk Limit Exceeded | 1097 |
QsEMP : LogRhythm Data Indexer Max Index Exceeded | LogRhythm Data Indexer Max Index Exceeded | 1098 |
QsEMP : LogRhythm Data Indexer List Not Found | LogRhythm Data Indexer List Not Found | 1099 |
QsEMP : LogRhythm Data Indexer Repo Not Found | LogRhythm Data Indexer Repo Not Found | 1100 |
QsEMP : LogRhythm Data Indexer Cluster Health | LogRhythm Data Indexer Cluster Health | 1101 |
QsEMP : LogRhythm Knowledge Base Update Error | LogRhythm Knowledge Base Update Error | 1102 |
New Alarm Rules
The following new alarm rules are included in the LogRhythm Diagnostics Module:
Alarm Rule | ID |
---|---|
LogRhythm Mediator Recycling - Hung MPE Threads | 1139 |
LogRhythm Scheduled Report Failure | 1140 |
LogRhythm AD Sync Failure | 1141 |
October 2017
New Alarm Rules
The following new alarm rules are included in the LogRhythm Diagnostics Module.
Alarm Rule | ID |
---|---|
Log Source Collection Error | 1424 |
LogRhythm Agent Max Memory Error | 1425 |
LogRhythm Agent Max Memory Warning | 1426 |
Data Indexer Cluster Health Downgrade | 1427 |
Data Indexer Cluster Health Recovery | 1428 |
Removed Alarm Rules
The following alarm rules have been removed from the LogRhythm Diagnostics Module.
Alarm Rule | ID |
---|---|
LogRhythm Data Indexer Cluster Health | 1101 |