Data Indexer
The Data Indexer (Indexer) provides persistence and search capabilities, as well as high-performance, distributed, and highly scalable indexing of machine and forensic data. Indexers can be clustered in a replicated configuration to enable high-availability, improved search performance, and support for a greater number of simultaneous users. Indexers store both the original and structured copy of data to enable search-based analytics. The Indexer is supported on Windows and Linux, as follows:
- Windows. You can install the Indexer on an XM Appliance, an upgraded Data Processor Appliance, your own server, or a virtual machine. This configuration is called a DPX, and the Indexer is "pinned" to the Data Processor.
- Linux. You can install one or 3-10 physical hot nodes, and 1-10 warm nodes (optional) on a Linux Indexer Appliance(s), your own server(s), or virtual machine(s). This configuration is called a DX or DX cluster, and the Indexer is installed alone.
For more information about installing or upgrading the Indexer, see the LogRhythm Software Installation Guide on the LogRhythm Community.
Indexer Services
The Indexer is a highly scalable, open-source, full-text search and analytics engine based on Elasticsearch. The full functionality of the Indexer is provided by the following micro services:
Service | Description |
---|---|
Bulldozer | Registers the Elasticsearch cluster name and nodes in the EMDB. Writes cluster statistics to the EMDB for use in the Deployment Monitor. |
Carpenter | Synchronizes LogRhythm KB and deployment data to Data Indexer indexes. |
Columbo | Executes query requests from LogRhythm components. |
Elasticsearch Service | Log persistence and indexing data store. |
GoMaintain | Maintains Data Indexer indices for disk space and time to live (TTL). |
Transporter | Facilitates interfacing to the Data Indexer through HTTP/REST. |
WatchTower | Receives analytic data from CloudAI. If CloudAI is not in use in your deployment, this service remains idle, even though it is enabled. |
Data Indexer File Locations
Windows | Linux |
---|---|
Data Indexer File Binaries | |
C:\Program Files\LogRhythm\Data Indexer | /usr/local/logrhythm |
Data Indexer Log Files | |
C:\Program Files\LogRhythm\Data Indexer\logs C:\Program Files\LogRhythm\Data Indexer\Elasticsearch\logs | /var/log/elasticsearch /var/log/persistent |
Data Indexer logs- Repository (Default Path) | |
${DXDATAPATH}\elasticsearch\data ${DXDATAPATH} = D:\LRIndexer | /usr/local/logrhythm/db/elasticsearch/data |
Data Indexer Service Start/Stop Scripts | |
C:\Program Files\LogRhythm\Data Indexer\tools\start-allservices.bat C:\Program Files\LogRhythm\Data Indexer\tools\stop-allservices.bat | /usr/local/logrhythm/tools/start-all-serviceslinux.sh /usr/local/logrhythm/tools/stop-all-serviceslinux.sh |
Information About Automatic Maintenance
Automatic maintenance is governed by several Data Indexer settings in the Configuration Manager.
GoMaintain IndexManage Disk HWM (%disktuil)
The disk utilization limit indicates the percentage of disk utilization that triggers maintenance. The default is 80, which means that maintenance starts when the Elasticsearch data disk is 80% full. The value for Disk Util Limit should not be set higher than 80. This value can have an impact on the ability of Elasticsearch to store replica shards for the purpose of failover.
GoMaintain IndexManage Elasticsearch Head (%esheap)
The heap utilization limit is the maximum Elasticsearch heap usage above which GoMaintain performs index TTL management. The default is 85, which means that management begins when the heap pressure exceeds that amount.
GoMaintain TTL Logs (#indices)
The DX monitors Elasticsearch memory and DX storage capacity. GoMaintain tracks heap pressure on the nodes. If the pressure constantly crosses the threshold, GoMaintain decreases the number of days of indices by closing the index. Closing the index removes the resource needs of managing that data and relieves the heap pressure on Elasticsearch. GoMaintain continues to close days until the memory is under the warning threshold and continues to delete days based on the disk utilization setting of 80% by default.
The default config is -1. This value monitors the systems resources and automanages the time-to-live (TTL). You can configure a lower TTL by changing this number. If this number is no longer achievable, the DX sends a diagnostic warning and starts closing the indices.
Indices that have been closed by GoMaintain are not active searchable in 8.0.0 but are maintained for reference purposes. To see which indices are closed, you can run a curl command such as the following:
curl -s -XGET 'http://localhost:9200/_cat/indices?h=status,index' | awk '$1 == "close" {print $2}'
You can also open a browser to http://localhost:9200/_cat/indices?v to show both open and closed indices.
Indices can be reopened with the following query as long as you have enough heap memory and disk space to support this index. If you do not, it immediately closes again.
curl -XPOST 'localhost:9200/<index>/_open?pretty'
After you open the index in this way, you can investigate the data in either the Web Console or Client Console.
GoMaintain TTL and Disk Settings for Restored Indices
Users can now enable/disable the maintenance settings in the Configuration Manager for indices created by SecondLook. This allows the user to configure GoMaintain’s TTL and Disk settings for restored indices. The following changes have been made in the Configuration Manager:
Setting | Field Type | Description | Default |
---|---|---|---|
GoMaintain Logsar - Maintenance | Toggle: Enabled/Disabled | Enable or disable automatic maintenance of archive indices created by SecondLook. | Disabled |
GoMaintain TTL Logsar - (#indices) | Text Box Range: -1 to 100000000 | Maximum number of logsar indices to store. Default setting (-1) automatically manages number of indices based on available resources. | -1 |
GoMaintain Max. Archive Index Disk Size | Text Box Range: -1 to 100000000 | Maximum disk size in GB, above which GoMaintain performs index TTL management. | 100 |
GoMaintain Force Merge
Force Merge settings are not preserved during an upgrade. They must be re-enabled in the Configuration Manager after performing an upgrade.
Parameter | Description | Default |
---|---|---|
GoMaintain ForceMerge | The Force Merge configuration combines index segments to improve search performance. In larger deployments, search performance could degrade over time due to a large number of segments. Force merge can alleviate this issue by optimizing older indices and reducing heap usage. | Disabled |
GoMaintain ForceMerge Hour (UTC hour of day) | The hour of the day, in UTC, when the merge operation should begin. If Only Merge Periodically is set to false, GoMaintain merges segments continuously, and this setting is not used. | 1 |
GoMaintain ForceMerge Days to Exclude (#days) | The number of days into the past of the index to merge. | 10 |
Logging of configuration and results for Force Merge can be found in C:\Program Files\LogRhythm\DataIndexer\logs\GoMaintain.log on Windows machines. On Linux, use the following command: /var/log/persistent/gomaintain.log
.
If the Data Indexer is a multi-node cluster, there will only be a log for one of the nodes with a GoMaintain lock. To find out which node has the lock, use the following command: sudo /usr/local/logrhythm/tools/