IT Operations Module User Guide
This guide is meant to be used as a day-to-day reference for the IT Operations content. All the content included in this module is listed here along with a detailed explanation, suggested response, and configuration and tuning notes.
Suppression Period The Suppression Multiple in conjunction with the Suppression Period defines how much time must pass before the same AI Engine rule can be triggered again for the same set of criteria. Measured in minutes in this guide.
Environmental Dependence Factor: EDF is a high-level quantification of how much effort is required in configuration and tuning for an AI Engine rule to perform as expected. This setting has no impact on processing.
False Positive Probability: The False Positive Probability is used in Risk-Based Priority (RBP) calculation for AI Engine Rules. It estimates how likely the rule is to generate a false positive response. A value of low indicates the pattern the rule matches is almost always a true positive. However, a value of high indicates the pattern the rule matches is very likely to be a false positive.
Options range from 0 to 9 with:
- 0 indicating the pattern the rule matched is almost always a true positive
- 9 indicating the pattern the rule matched is very likely to be a false positive
AIE Rule Name | IT Ops: Crit System Shutdown |
AIE Rule ID | 1378 |
AIE Rule Brief Description | Monitors for system shutdowns that are not followed by startup activity. Must be tuned to select "always on" hosts and for appropriate time frame for system to startup after shutdown activity. |
Classification | Audit/Startup and Shutdown |
Event Suppression Period | 30 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 7 |
AIE Rule Additional Details | Specific systems or an entity containing critical systems should be defined within the Primary Criteria for each block to reduce alarms from this rule. Log Source Criteria for each block can also be defined to limit alarms to specific critical systems. |
AIE Rule Name | IT Ops: Crit Service Stopped |
AIE Rule ID | 1379 |
AIE Rule Brief Description | Rule observes for service stop events that are not followed by service start events |
Classification | Operations/Critical |
Event Suppression Period | 5 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 5 |
AIE Rule Additional Details | Rule blocks must be tuned to critical services and log sources |
AIE Rule Name | IT Ops: Crit Win Service Failed To Recover |
AIE Rule ID | 1380 |
AIE Rule Brief Description | Rule looking for Windows services which attempt to recover, but fail |
Classification | Operations/Warning |
Event Suppression Period | 1 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 5 |
AIE Rule Additional Details | Should be tuned to critical log sources and services |
AIE Rule Name | IT Ops: Crit Backup Failure |
AIE Rule ID | 1381 |
AIE Rule Brief Description | Monitors for failed backup events |
Classification | Operations/Critical |
Event Suppression Period | 1 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 1 |
AIE Rule Additional Details | Should be tuned to critical log sources |
AIE Rule Name | IT Ops: Crit Application Config Change |
AIE Rule ID | 1441 |
AIE Rule Brief Description | Observes for changes to critical application configurations |
Classification | Audit/Configuration |
Event Suppression Period | 1 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 1 |
AIE Rule Additional Details | Should be tuned to critical hosts and or applications/processes |
AIE Rule Name | IT Ops: Crit Database Config Change |
AIE Rule ID | 1442 |
AIE Rule Brief Description | Monitors for changes to critical database configurations |
Classification | Audit/Configuration |
Event Suppression Period | 1 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 1 |
AIE Rule Additional Details | Should be tuned to critical hosts and or databases objects |
AIE Rule Name | IT Ops: Crit Dir. Services Config Change |
AIE Rule ID | 1443 |
AIE Rule Brief Description | Monitors for changes to critical directory services configurations |
Classification | Audit/Configuration |
Event Suppression Period | 1 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 1 |
AIE Rule Additional Details | Should be tuned to critical hosts and or domains |
AIE Rule Name | IT Ops: Crit Net Access Config Change |
AIE Rule ID | 1444 |
AIE Rule Brief Description | Monitors for changes to critical network access configurations |
Classification | Audit/Configuration |
Event Suppression Period | 1 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 1 |
AIE Rule Additional Details | Should be tuned to critical hosts and or network segments |
AIE Rule Name | IT Ops: Crit Security Config Change |
AIE Rule ID | 1445 |
AIE Rule Brief Description | Monitors for changes to critical security configurations |
Classification | Audit/Configuration |
Event Suppression Period | 1 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 1 |
AIE Rule Additional Details | Should be tuned to critical hosts. Windows Filtering Platform events are filtered out due to high frequency. |
AIE Rule Name | IT Ops: Crit System Config Change |
AIE Rule ID | 1446 |
AIE Rule Brief Description | Monitors for changes to critical system configurations |
Classification | Audit/Configuration |
Event Suppression Period | 1 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 1 |
AIE Rule Additional Details | Should be tuned to critical hosts |
AIE Rule Name | IT Ops: Win Application Error Tracking |
AIE Rule ID | 1447 |
AIE Rule Brief Description | Rule tracks windows application errors that exceed a normal level. |
Classification | Operations/Error |
Event Suppression Period | 1440 |
Alarm on Event Occurrence | No |
Environmental Dependency Factor | Low |
False Positive Probability | 5 |
AIE Rule Additional Details | This rule is intended to be used in tandem with another rule that tracks Windows Update events for the intention of alarming on excessive application errors following Windows updates. |
AIE Rule Name | IT Ops: Possible Bad Win Update : App Error |
AIE Rule ID | 1448 |
AIE Rule Brief Description | Rule watches for Windows Application Error Tracking trend rule firing following Windows Updates being installed. Rule fires alarm if a higher incidence of application errors have occurred. |
Classification | Operations/Warning |
Event Suppression Period | 4320 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Low |
False Positive Probability | 5 |
AIE Rule Additional Details | N/A |
AIE Rule Name | IT Ops: Possible Bad Win Update : Sys Crash |
AIE Rule ID | 1451 |
AIE Rule Brief Description | Rule watches for a Windows crash dump log following Windows Updates being installed. |
Classification | Operations/Warning |
Event Suppression Period | 4320 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 5 |
AIE Rule Additional Details | N/A |
AIE Rule Name | IT Ops: Slow Web Server Response Times |
AIE Rule ID | 1458 |
AIE Rule Brief Description | Rule observes for slow web server response times |
Classification | Operations/Warning |
Event Suppression Period | 0 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Low |
False Positive Probability | 5 |
AIE Rule Additional Details | Rule block can be modified to only web server log sources. Can be used with custom sources that utilize HTTP 2xx common events. |
AIE Rule Name | IT Ops: PerfMon: Proc Time Thrshld Exceeded |
AIE Rule ID | 1470 |
AIE Rule Brief Description | Rule observes for 20 or more threshold exceeded alarms within 6 minutes from Windows PerfMon for % Processor Time counter |
Classification | Operations/Warning |
Event Suppression Period | 360 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Low |
False Positive Probability | 1 |
AIE Rule Additional Details | Rule was designed with a 15 second polling interval in PerfMon. Suppression interval |
AIE Rule Name | IT Ops: PerfMon: Low Free Disk Space |
AIE Rule ID | 1471 |
AIE Rule Brief Description | Rule observes for low disk space alerts from Windows PerfMon counters. |
Classification | Operations/Warning |
Event Suppression Period | 1470 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Low |
False Positive Probability | 1 |
AIE Rule Additional Details | Rule was designed with a 15 second polling interval for disk space. 120 messages (30 minutes) of alerts will trigger the alarm. Alarm will not repeat for approximately 24 hours after first alarm. |
AIE Rule Name | IT Ops: Nagios: Sys Offline Attribution |
AIE Rule ID | 1472 |
AIE Rule Brief Description | Observes for several critical, warning, or error events followed by Nagios detecting a host hard down status |
Classification | Operations/Warning |
Event Suppression Period | 360 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 5 |
AIE Rule Additional Details | This rule assumes default notification intervals are used. If the default 60 minute notification interval is not used, it is recommended to adjust the suppression multiple for this AIE rule to avoid too few or too many alarms (depending on whether the notification interval is raised or lowered). |
AIE Rule Name | IT Ops: Nagios: Sys Off Following Win Update |
AIE Rule ID | 1473 |
AIE Rule Brief Description | Observes for successful Windows Update install followed by Nagios event indicating a system is down |
Classification | Operations/Warning |
Event Suppression Period | 360 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Low |
False Positive Probability | 5 |
AIE Rule Additional Details | This rule assumes default notification intervals are used. If the default 60 minute notification interval is not used, it is recommended to adjust the suppression multiple for this AIE rule to avoid too few or too many alarms (depending on whether the notification interval is raised or lowered). |
AIE Rule Name | IT Ops: PerfMon: Dsk % Idle Time Blw Thrshld |
AIE Rule ID | 1474 |
AIE Rule Brief Description | Monitors for low disk idle time from Performance Monitor |
Classification | Operations/Warning |
Event Suppression Period | 360 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Low |
False Positive Probability | 1 |
AIE Rule Additional Details | N/A |
AIE Rule Name | IT Ops: Nagios: Service State Offline |
AIE Rule ID | 1476 |
AIE Rule Brief Description | Observes for a hard service down or critical notification from Nagios |
Classification | Operations/Warning |
Event Suppression Period | 1 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 1 |
AIE Rule Additional Details | Rule block should be tuned to critical processes and log sources. If left open will fire for any services monitored by Nagios. |
AIE Rule Name | IT Ops: Nagios: Sys Offline Following Change |
AIE Rule ID | 1485 |
AIE Rule Brief Description | Observes for configuration change followed by Nagios detecting a host hard down status |
Classification | Operations/Warning |
Event Suppression Period | 360 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Medium |
False Positive Probability | 5 |
AIE Rule Additional Details | This rule assumes default notification intervals are used. If the default 60 minute notification interval is not used, it is recommended to adjust the suppression multiple for this AIE rule to avoid too few or too many alarms (depending on whether the notification interval is raised or lowered). |
AIE Rule Name | IT Ops: VMWare: RAM Disk Full |
AIE Rule ID | 1486 |
AIE Rule Brief Description | Observes for specific logging activity indicative of a full RAM Disk |
Classification | Operations/Warning |
Event Suppression Period | 100 |
Alarm on Event Occurrence | Yes |
Environmental Dependency Factor | Low |
False Positive Probability | 0 |
AIE Rule Additional Details | N/A |
AIE Rule Name | IT Ops: LogRhythm Lifecycle Controller |
AIE Rule ID | 1597 |
AIE Rule Brief Description | Rule to event on LogRhythm Lifecycle Controller logs. |
Classification | Operations/Warning |
Event Suppression Period | 1 |
Alarm on Event Occurrence | No |
Environmental Dependency Factor | None |
False Positive Probability | 0 |
AIE Rule Additional Details | Rule to event on LogRhythm Lifecycle Controller logs. |
Reports
Report Name | IT Ops: AIE Alarm Summary |
Class | Operations |
Description | Report summarizes all IT Operates Module AI Engine Alarm Activity |
Template Type | Executive Report |
Data Source | Platform Manager |
Report Name | IT Ops: Windows Host Bugcheck Reboot Summary |
Class | Operations |
Description | Report summarizes all Windows bugcheck reboots by host |
Template Type | Log Summary Report |
Data Source | LogMart |