Message Processing Engine Rule Builder

You use the Message Processing Engine (MPE) Rule Builder to view, create, and edit new MPE base rules and sub-rules. New rules are needed to collect and process logs from any new Log Source Type.

Training for Rule Building

You can submit a Device Support Request to LogRhythm to have new rules developed by LogRhythm Engineers. Alternatively, you can build MPE Rules yourself; however, it is highly recommended that you attend the rule building training offered by LogRhythm before you attempt to build custom rules. Customers who attend LogRhythm Custom Rules Training have the option of taking the additional two-day Advanced Rule Development class. The class includes hands-on, instructor-led training and class materials that instruct you on how to create rules for your organization. This topic presents a brief overview of the MPE Rule Builder, but is not intended to be a substitute for the Advanced Rule Development training.

To find out more about available training schedules, contact your LogRhythm sales representative.

MPE Policies

An MPE Policy is a collection of MPE Rules designed for a specific Log Source Type such as Cisco PIX or Windows 2003/2005 Security Event Log. Only Logs that are generated from Log Sources that have an assigned MPE Policy are processed by the MPE.

Log Sources can only be:

Assigned an MPE Policy associated with the same Log Source Type.
Assigned an individual MPE Policy.

The LogRhythm Knowledge Base comes with pre-packaged default MPE Policies for all supported devices. It is the MPE Policy, not the MPE Rule, that determines the data management and event management settings. The MPE Rule sets the default settings for the policy when the Rule is first added.

Custom Policies can be tailored to the type of system to which they are assigned. For example, event forwarding settings for security event logs on a file server could be different from those on a domain controller.

You can download MPE rules by going to the LogRhythm Community, clicking on the Shareables link on menu at the top of the page, and then click on the Log Sources option. The filters allow you to choose from supported and unsupported plugins, as well as ones created by LogRhythm or by other users.

Base Rules and Sub-Rules

The rules used for identifying and processing log messages include:

Base Rule. Contains a tagged regular expression (the regex) used to identify the pattern of a log and isolate interesting pieces of metadata. Using a tagging system, these metadata strings can be directed to special fields used by LogRhythm to better interpret a log or specifically identify it. In general, base rules identify log messages by matching the fields to a specific log format or pattern.
Sub-rule. Differentiates log messages that match the same base rule using values in the log. Sub-rule tags can include a regex that only applies to the string in a specific field to identify information such as a log / event identification number, a message string, or even a user or group name.

LogRhythm MPE rules use .NET/C# syntax for writing regular expressions.

Examples

Here is an example of a log message that might be received via syslog depicting a Microsoft SQL Server authentication.

6/18/2007 2:36 PM TYPE=Information USER=NT AUTHORITY\SYSTEM COMP=DBSERVER SORC=MSSQLSERVER CATG=Logon EVID=17055 MESG=18453 : Login succeeded for user ''NT AUTHORITY\SYSTEM''. Connection: Trusted.

The base rule matching this log is named SQL Server Authentication, and has the following regex:

COMP=(?<dname>[^\s]+)?\s+SORC=mssqlserver\sCATG=.+?EVID=17055\s+MESG=(?<tag1>

18453|18454|18455|18456).+?user\s\'([^\\]+?\\)?(?<tag2.login>[^\']+)\'.*+?\\)

?(?<tag2.login>[^\']+)\'.*

This base rule matches the pattern of the log and pull the metadata textDBSERVER into the <dname> tag, 18453 into <tag1>, and NT AUTHORITY\SYSTEM into <tag2.login> (which puts the metadata into both the <tag2> and <login> fields).

The sub-rule matching the metadata in those tags is then MS-AppLog Msg 18453: Successful Authentication, which matches when 18453 is in <tag1>, and anything (*) is in the <tag2> field. This sub-rule specifically identifies the occurrence as a successful authentication to a SQL server.

Rule Processing Logic

Rules are evaluated in the following order:

Custom base rules are always run before system base rules
Custom sub-rules are always run before system sub-rules
Sub-rules where VMID should equal a specific value are always processed before sub-rules where VMID can be any value including system sub-rules

Processing logic for a log:

Evaluate log against base rule, if matched.
Evaluate log against all sub-rules:
If sub-rule is matched, associate log with sub-rule.
If no sub-rules or no sub-rule match, associate log with base rule unless base rule is a pattern.

Pattern rules:

Use in instances where a log should only match a sub-rule.
To specify a base rule as a pattern, prefix the rule name with “Pattern”.

Pattern rules are processed differently in the Test Center. To test such rules, rename them without using the prefix "Pattern." Add the "Pattern" prefix back when you have finished testing.

Base Rule Sorting

MPE Base rules are auto-sorted and have the following options

Customers can specify static or auto-sorting for each custom base rule.
Customers can enforce relative ordering for each auto-sorted custom base rule.
Customers can override the relative ordering for each auto-sorted system base rule.
Customers can specify that an auto-sorted custom rule should sort above all system rules.

System sub-rules and custom sub-rules are statically sorted.

The rule processing order for Base Rules is:

Custom Base rules by Sort Order
Custom Sub-rules where the value parsed for VMID should be a specific value (i.e., VMID=1001, VMID=1002…) by sort order.
All other Custom Sub-rules by sort order.

System Base rules by Sort Order:

Custom Sub-rules where the value parsed for VMID should be a specific value (i.e., VMID=1001, VMID=1002…) by sort order.
System Sub-rules where the value parsed for VMID should be a specific value (i.e., VMID=1001, VMID=1002…) by sort order.
All other Custom Sub-rules by sort order.
All other System Sub-rules by sort order

Sub-Rule Sorting

When using wildcards and Regexes, it is important to understand how their sorting is treated by the MPE. The sort order determines the priority in which a sub-rule is matched. For instance, if you have a sub-rule where all the mapping tags are wildcards, you want to make sure this sub-rule is the last sorted item, because sorting it higher would cause it to always match, and rules below would never be tested for matching. Sorting only affects sub-rules with wildcards and Regexes. Sorting is irrelevant for sub-rules where each mapping tag value is specified since the sub-rule only matches the exact values specified.

The MPE processes rules in the following order:

Custom sub-rules where VMID is equal to a specific value.
System sub-rules where VMID is equal to a specific value.
Custom sub-rules where VMID is NOT equal to a specific value.
System sub-rules where VMID is NOT equal to a specific value.

MPE Rule Statuses

There are three statuses used in the MPE Rule Builder, which can determine how created MPE rules are applied to log messages.

Status

Description

Development

While creating new rules, use the Development status to ensure they are not used to identify log messages while being built.

Test/Production

This allows the rule to be applied to log messages by the MPE in the LogRhythm SIEM.

Previously, the Test status would not trigger alarms and would automatically disable rules after 1000 log messages were processed. However, the Test status is currently functionally identical to the Production status. MPE rules that are new and undergoing testing can be given the Test status, and once the rules have passed testing and are ready for everyday use, they can be given the Production status.

The logs that match the MPE Rule can be displayed on the Dashboard.
Custom MPE rules can be edited, even in Production mode; however, System MPE Rules (created by LogRhythm) cannot be edited.

Review the scmedsvr.log file for messages about performance issues.

Parse Fields and Tags

The following tables provide lists of all the metadata fields LogRhythm can parse, as well as their associated parsing tags, and default regex. The fields are grouped by how they appear in the Web Console. If you do not see a field in the Web Console in the same tab as this document, you may have tagged the field as a favorite, in which case the field will appear in the Favorites tab instead of the main group tab as shown in this document. If necessary, the default regex can be overridden, as described in Override the Default Regex.

All Mapping and Parsing tags are lower case.

Field	Description	Tags	Default Regex
Application Tab
Application	Application derived by IANA protocol and port number or directly assigned in MPE processing settings.	N/A	N/A
Object	The resource (i.e., file) referenced or impacted by activity reported in the log.	<object>	\w+
Object Name	The descriptive name of the object. Do not use unless Object is also used.	<objectname>	\w+
Object Type	A category type for the object (e.g., file, image, pdf, etc.).	<objecttype>	\w+
Hash	The hash value reported in the log. Choose MD5 > Sha1 > Sha256.	<hash>	\w+
Policy	The specific policy referenced (i.e., Firewall, Proxy) in a log message.	<policy>	\w+
Result	The outcome of a command operation or action. For example, the result of quarantine might be success.	<result>	\w+
URL	The URL referenced or impacted by activity reported in the log. You may need to override the default regex for URLs that are not HTTP/HTTPS.	<url>	https?://.+
User Agent	The User Agent string from web server logs.	<useragent>	\w+
Response Code	The explicit and well-defined response code for an action or command captured in a log. Response Code differs from Result in that response code should be well- structured and easily identifiable as a code.	<responsecode>	\w+
Subject	The subject of an email or the general category of the log.	<subject>	\w+
Version	The software or hardware device version described in either the process or object.	<version>	\w+
Command	The specific command executed that has been recorded in the log message.	<command>	\w+
Reason	The justification for an action or result when not an explicit policy.	<reason>	\w+
Action	Field for "what was done" as described in the log. Action is usually a secondary function of a command or process.	<action>	\w+
Status	The vendor's perspective on the state of a system, process, or entity. Status should NOT be used as the result of an action.	<status>	\w+
Session Type	The type of session described in the log (e.g., console, CLI, web). Unique from IANA Protocol.	<sessiontype>	\w+
Process Name	System or application process described by the log message.	<process>	\w+
Process ID	Numeric ID value for a process.	<processid>	\d+
Parent Process ID	The parent process ID of a system or application process that is of interest.	<parentprocessid>	\w+
Parent Process Name	The parent process name of a system or application process.	<parentprocessname>	\w+
Parent Process Path	The full path of a parent process of a system or application process.	<parentprocesspath>	\w+
Quantity	A numeric count of something. For example, there are 4 lights (quantity is 4).	<quantity>	[0123456789\.]+
Amount	The qualitative description of quantity (percentage or relative numbers) For example, half the lights are on (amount is .5 or 50). Amount is also used for currency.	<amount>	[0123456789\.]+
Size	Numeric description of capacity (e.g., disk size) without a specific unit of measurement. Size is generally used as a limit rather than a current measurement. Use Amount for non- specific measurements.	<size>	[0123456789\.]+
Rate	Defines a number of something per unit of time without a specific unit of measurement. Always expressed as a fraction.	<rate>	[0123456789\.]+
Duration	The elapsed time reported in a log message, derived from multiple fields. Timestart and Timeend need custom parsing patterns.	If log has start/end use: (?<timestart>pattern) (?<timeend>pattern) If log has elapsed time use: <days> <hours> <minutes> <seconds> <milliseconds> <microseconds> <nanoseconds>	[0123456789\.]+ Note: Time Start and Time End tags must be overloaded to function properly.
Session	Unique user or system session identifier.	<session>	\w+
Known Application	Application derived from IANA protocol and port number. If a known application cannot be derived, it is displayed as unknown.	N/A	N/A
Kbytes/Packets Tab
Host (Impacted) KBytes Rcvd Host (Impacted) KBytes Sent Host (Impacted) Kbytes Total	The number of bytes sent or received in the context of the Impacted Host. Rcvd – Bytes received by impacted host Sent – Bytes sent by impacted host Total – Total bytes in session as seen by impacted host	Use the appropriate tags based upon the units and direction represented by the log data: <bitsin>, <bitsout> <bytesin>, <bytesout> <kilobitsin>, <kilobitsout><kilobytesin>, <kilobytesout><megabitsin>, <megabitsout> <megabytein>, <megabyteout><gigabitsin>, <gigabitsout><gigabytein>, <gigabyteout> <terabitsin>, <terabitsout><terabytesin>, <terabytesout><petabitsin>, <petabitsout> <petabytesin>, <petabytesout>,<bits>, <bytes>, <kilobits>,<kilobytes>, <megabits>, <megabytes>, <gigabits>,<gigabytes>, <terabits>,<terabytes>, <petabits>,<petabytes>	[0123456789\.]+
Host (Impacted) Packets Rcvd Host (Impacted) Packets Sent Host (Impacted) Packets Total	The number of packets sent or received in the context of the Impacted Host. Rcvd – Packets received by impacted host Sent – Packets sent by impacted host Total – Total packets in session as seen by impacted host	<packetsin>, <packetsout>, <packets>	[0123456789\.]+
Classification Tab
Classification	Value is determined based on the MPE Rule's assigned Common Event.	N/A	N/A
Common Event	Value is determined based on the MPE Rule's Assigned Common Event.	N/A	N/A
Priority	Value is determined based on the Risk-Based-Priority (RBP) calculation.	N/A	N/A
Direction	Indicates the directional flow of data between the Origin Host and the Impacted Host — Inbound, Outbound, Internal, External, or Unknown.	N/A	N/A
Severity	The vendor's view of the severity of the log.	<severity>	\w+
Vendor Message ID	Specific vendor for the log used to describe a type of event.	<vmid>	\w+
Vendor Info	Description of a specific vendor log or event identifier for the log. Human readable elaboration that directly correlates to the VMID.	<vendorinfo>	\w+
MPE Rule Name	Name of rule that matched, assigned on rule creation.	N/A	N/A
Threat Name	The name of a threat described in the log message (e.g., malware, exploit name, signature name). Do not overload with Policy.	<threatname>	\w+
Threat ID	ID number or unique identifier of a threat. Note that CVE is stored separately.	<threatid>	\w+
CVE	CVE ID (i.e., CVE-1999-0003) from vulnerability scan data.	<cve>	\w+
Host Tab
Host (Origin)	Origin host derived from Origin IP Address and/or Origin Hostname.	N/A	N/A
Host (Impacted)	Impacted host derived from Impacted IP Address and/or Impacted Hostname.	N/A	N/A
MAC Address (Origin)	The MAC address from which activity originated (i.e., attacker, client).	<smac>	(\w{2}(:\|-)?){6}
MAC Address (Impacted)	The MAC address that was affected by the activity (i.e., target, server).	<dmac>	(\w{2}(:\|-)?){6}
Interface (Origin)	The network port/interface from which the activity originated (i.e., attacker, client).	<sinterface>	\w+
Interface (Impacted)	The network port/interface that was affected by the activity (i.e., target, server).	<dinterface>	\w+
IP Address (Origin)	The IP address from which activity originated (i.e., attacker, client).	<sip> (parses IPv4 and IPv6)	((?<sipv4>(?<sipv4>1??(1 ??\d{1,2}\|2[0-4]\d\|25[0- 5])\.(1??\d{1,2}\|2[0- 4]\d\|25[0- 5])\.(1??\d{1,2}\|2[0- 4]\d\|25[0- 5])\.(1??\d{1,2}\|2[0- 4]\d\|25[0- 5])))\|(?<sipv6>(?<sipv6>1 ??((?:(?:[0-9A-Fa- f]{1,4}:){7}[0-9A-Fa- f]{1,4}\|(?=(?:[0-9A-Fa- f]{1,4}:){0,7}[0-9A-Fa- f]{1,4}\z)\|(([0-9A-Fa- f]{1,4}:){1,7}\|:)((:[0-9A-Fa- f]{1,4}){1,7}\|:))))))
IP Address (Impacted)	The IP address that was affected by the activity (i.e., target, server).	<dip> (parses IPv4 and IPv6)	((?<dipv4>(?<dipv4>1??( 1??\d{1,2}\|2[0-4]\d\|25[0- 5])\.(1??\d{1,2}\|2[0- 4]\d\|25[0- 5])\.(1??\d{1,2}\|2[0- 4]\d\|25[0- 5])\.(1??\d{1,2}\|2[0- 4]\d\|25[0- 5])))\|(?<dipv6>(?<dipv6>1 ??((?:(?:[0-9A-Fa- f]{1,4}:){7}[0-9A-Fa- f]{1,4}\|(?=(?:[0-9A-Fa- f]{1,4}:){0,7}[0-9A-Fa- f]{1,4}\z)\|(([0-9A-Fa- f]{1,4}:){1,7}\|:)((:[0-9A-Fa- f]{1,4}){1,7}\|:))))))
NAT IP Address (Origin)	The Network Address Translated (NAT) IP address from which activity originated (i.e., attacker, client).	<snatip>	Same as IP Origin (<sip>)
NAT IP Address (Impacted)	The Network Address Translated (NAT) IP address that was affected by the activity (i.e., target, server).	<dnatip>	Same as IP Impacted (<dip>)
Hostname (Origin)	The hostname from which activity originated (i.e., attacker, client).	<sname> (or DNS resolved from IP)	([^\s\.]+\.?)+
Hostname (Impacted)	The hostname that was affected by the activity (i.e., target, server).	<dname> (or DNS resolved from IP)	([^\s\.]+\.?)+
Known Host (Origin)	A value determined by mapping parsed origin host identifiers, such as IP address or hostname, to a LogRhythm host record.	N/A	N/A
Known Host (Impacted)	A value determined by mapping parsed impacted host identifiers, such as IP address or hostname, to a LogRhythm host record.	N/A	N/A
Serial Number	The hardware or software serial number in a log message. This value should be a permanent unique identifier.	<serialnumber>	\w+
Identity Tab
User (Origin)	The originating user or system account of the activity reported in the log.	<login>	\w+
User (Impacted)	The user or system account impacted by activity reported in the log.	<account>	\w+
Sender	The sender of an email or the "caller number" for a VOIP log. This value must relate to a specific user or unique address in the case of a phone call or email.	<sender>	[^\s]+@[^\s]+
Recipient	The recipient of an email or the dialed number for a VOIP log.	<recipient>	[^\s]+@[^\s]+
Group	The user group or role impacted by activity reported in the log. Do not use for entity group (zone or domain).	<group>	\w+
Location Tab
Entity (Origin)	A value determined based on the origin host’s assigned entity.	N/A	N/A
Entity (Impacted)	A value determined based on the impacted host’s assigned entity.	N/A	N/A
Zone (Origin)	A value determined based on the zone of the origin host — Internal, External, DMZ, or Unknown.	N/A	N/A
Zone (Impacted)	A value determined based on the zone of the impacted host — Internal, External, DMZ, or Unknown.	N/A	N/A
Location (Origin)	A value determined by resolving the parsed origin IP address against a Geo-IP database.	N/A	N/A
Location (Impacted)	A value determined by resolving the parsed impacted IP address against a Geo-IP database.	N/A	N/A
Country (Origin)	The country in which the determined origin location exists.	N/A	N/A
Country (Impacted)	The country in which the determined impacted location exists.	N/A	N/A
Log Tab
Log Date	Timestamp when the log was generated or received, corrected to UTC.	N/A	N/A
Log Count	The number of identical log messages received.	N/A	N/A
Log Source Entity	The entity to which the log source belongs.	N/A	N/A
Log Source Type	The device or application type from which a log was received.	N/A	N/A
Log Source Host	The origin host from which the log was received.	N/A	N/A
Log Source	The assigned name of a log source.	N/A	N/A
Log Sequence Number	The sequence in which a log was collected, generated by the Agent.	N/A	N/A
Log Message	The raw log message.	N/A	N/A
First Log Date	Timestamp when the first identical log message was received.	N/A	N/A
Last Log Date	Timestamp when the last identical log message was received.	N/A	N/A
Network Tab
Network (Origin)	A value determined by mapping the origin IP address to a LogRhythm network record.	N/A	N/A
Network (Impacted)	A value determined by mapping the impacted IP address to a LogRhythm network record.	N/A	N/A
Domain (Impacted)	The Windows or DNS domain name referenced or impacted by activity reported in the log.	<domain> or <domainimpacted>	\w+
Domain (Origin)	The Windows or DNS domain where the logged activity originated.	<domainorigin>	\w+
Protocol	The IANA protocol name or number.	<protnum>,<protname>	1??\d{1,2}\|2[0-4]\d\|25[0-5] \w+
TCP/UDP Port (Origin)	The port from which activity originated (i.e., client, attacker port).	<sport>	\d+
TCP/UDP Port (Impacted)	The port to which activity was targeted (i.e., server, target port).	<dport>	\d+
NAT TCP/UDP Port (Origin)	The Network Address Translated (NAT) port from which activity originated (i.e., client, attacker port).	<snatport>	\d+
NAT TCP/UDP Port (Impacted)	The Network Address Translated (NAT) port to which activity was targeted (i.e., server, target port).	<dnatport>	\d+

Map Tags

Five additional tags are available for identifying data in the log specifically for sub-rules. These tags do not parse text into metadata fields, so they do not appear in Investigations, Reports, and so on. These tags are intended only to identify portions of the log message that should be used in the development of sub-rules.

Tag	Field Type	Default Regex
<tag1>	Text	.*
<tag2>	Text	.*
<tag3>	Text	.*
<tag4>	Text	.*
<tag5>	Text	.*

Override the Default Regex

The default regex is applied by using only the named group tag. For example, <account> will apply the regex pattern \w+ in the rule, as shown in the table above.

If the default regex for a parsing tag will not properly parse the correct data out of the log message or is not the optimal regex from a performance perspective, the default should be overridden. To override the default regex, the following syntax should be used:

(?<[tagname]>[regex])

For example, suppose your regex needs to match file names with a specific extension such as the sample log message below:

User john.doe opened AnnualReport.pdf

If the base rule was written as:

User <login> opened <object>

The value parsed for login would be john and the value for object would be AnnualReport. This is due to the fact that a period is not a word character and the default regex of “\w+” would only match up to the period. Instead, the default regular expressions should be overridden, and the base rule should be:

User (?<login>\w+\.?\w*) opened (?<object>\w+\.pdf)

Now, the base rule will parse anything for login starting with a word character that optionally contains a period followed by additional word characters.

Do not override the default regex for fields which parse an IP address, such as <sip>, <dip>, <sipv6>, and so on.