Upgrade the Data Indexer in an HA Deployment

Configure a Proxy Connection for Indexer Upgrades

If your Linux Data Indexer sits behind a proxy server, you need to add the proxy address and optional username and password to the yum configuration file on the Indexer from which you are running the upgrade.

To configure proxy options in yum.conf:

Log on to your Indexer appliance or server as logrhythm.
To open the file for editing, type:
CODE
```
sudo vi /etc/yum.conf 
```
To enter INSERT mode, type i.

Add the following lines to the file:

proxy=<proxyURL:port>

proxy_username=<username>

proxy_password=<password>

EXAMPLE

proxy=http://my.proxyaddress.com:9999/

proxy_username=myloginID

proxy_password=mypassword

Press Esc.
To exit and save yum.conf type :wq

Configure Upgrades Without Internet Access (Dark Sites)

If your Linux Data Indexer does not have access to the Internet (for example, in a restricted environment or at a dark site), you may need to modify CentOS-Base.repo so that repositories are skipped if they are unavailable.

CentOS-Base.repo contains the base, updates, extras, and centosplus repositories. By default, updates to centosplus are disabled (i.e., enabled is set to 0). For base, updates, and extras, you will need to add a line that will skip updates if the repo is unavailable.

If you are upgrading a multi-node cluster, you only need to modify CentOS-Base.repo on the node from which you will be running the upgrade.

To configure repository options in CentOS-Base.repo:

Log in to your Indexer appliance or server as logrhythm.

To open the file for editing, type:

CODE

sudo vi /etc/yum.repos.d/CentOS-Base.repo

To enter INSERT mode, type i.
Within each of the three repository sections — base, updates, and extras — add the following line:
CODE
```
skip_if_unavailable=true
```
Press Esc.
To exit and save CentOS-Base.repo type :wq

Upgrade a Single-node Cluster

You must run the upgrade on each cluster. Run it on the same machine where you ran the original installer. If you have more than one node in your cluster, follow the instructions in the Multi-Node section.

Before starting the Data Indexer installation or upgrade, ensure that firewalld is running on all cluster nodes. To do this, log on to each node and run: sudo systemctl start firewalld

Log on to your Indexer appliance or server as logrhythm.
Change to the /home/logrhythm/Soft directory where you copied the updated installation or upgrade script.
If you need to create a hosts file, use vi to create a file in /home/logrhythm/Soft called hosts.

If you are creating a new file, ensure that you specify the current Data Indexer hostname.

The hosts file must follow a defined pattern of {IPv4 address}, {hostname}, {boxtype}(optional) on each line. You must separate the address and hostname with a space. The file might look like the following:
```
10.1.23.91 LRLinux1 hot
```
If you do not specify a boxtype here, it will assume it is a hot node. This means the warm node configuration may be lost if you do not update the hosts file prior to running the upgrade.

Do not use fully qualified domain names for Indexer hosts. For example, use only LRLinux1 instead of LRLinux1.myorg.com.

The following command sequence illustrates how to create and modify a file with vi:
1. To create the hosts file and open for editing, type vi hosts.
2. To enter INSERT mode, type i.
3. Enter the IPv4 address, hostname to use for the Indexer, and box type, separated by a space.
4. Press Esc.
5. To exit and save your hosts file type: :wq
To install DX and make the machine accessible without a password, download the DataIndexerLinux.zip file from the Documentation & Downloads section of the LogRhythm Community, extract the PreInstall.sh file to /home/logrhythm and execute the script.

This cannot be run as sudo or the DX Installer will fail.
CODE
```
 sh ./PreInstall.sh
```
Run the installer with the hosts file argument:
CODE
```
sudo sh LRDataIndexer-<version>.centos.x86_64.run --hosts <absolute path to .hosts file> --plan /home/logrhythm/Soft/plan.yml
```
Press Tab after starting to type out the installer name, and the filename autocompletes for you.
If prompted for the SSH password, enter the password for the logrhythm user.
The script installs or upgrades the Data Indexer.

This process may take up to 10 minutes.

When the installation or upgrade is complete, a confirmation message appears.
Check the status of services by typing sudo systemctl at the prompt, and then look for failed services.

If the installation or upgrade fails with the error — failed to connect to the firewalld daemon — ensure that firewalld is running on all cluster nodes and start this procedure again. To do this, log in to each node and run the following command: sudo systemctl start firewalld

Once the cluster restarts, there will be a short period of downtime as the DX update finalizes.

Upgrade a Multi-node Cluster

You only need to run the upgrade on one node of each cluster, the package installer installs a Data Indexer on each node. Run it on the same machine where you ran the original installer.

Before starting the Data Indexer installation or upgrade, ensure that firewalld is running on all cluster nodes. To do this, log in to each node and run the following command: sudo systemctl start firewalld

Log on to your Indexer appliance or server as logrhythm.
Change to the /home/logrhythm/Soft directory where you copied the script.
You should have a file named hosts in the /home/logrhythm/Soft directory that was used during the original installation. The hosts file must follow a defined pattern of {IPv4 address}, {hostname}, {boxtype}(optional) on each line. You must separate the address and hostname with a space.
The contents of the file might look like the following:
```
10.1.23.65 LRLinux1 hot
10.1.23.67 LRLinux2 warm
10.1.23.91 LRLinux3 warm
```
The box type parameter is optional in the hosts file, if you do not specify a boxtype here, it will assume it is a hot node. This means the warm node configuration may be lost if you do not update the hosts file prior to running the upgrade.
If you need to create a hosts file, use vi to create a file in /home/logrhythm/Soft called hosts.

Do not use fully qualified domain names for Indexer hosts. For example, use only LRLinux1 instead of LRLinux1.myorg.com.

The following command sequence illustrates how to create and modify a file with vi:
1. To create the hosts file and open for editing, type vi hosts.
2. To enter INSERT mode, type i.
3. Enter the IPv4 address, the hostname to use for the Indexer, and the box type, separated by spaces.
4. Press Esc.
5. To exit and save your hosts file type :wq.
To install DX and make the machine accessible without a password, download the DataIndexerLinux.zip file from the Documentation & Downloads section of the LogRhythm Community, extract the the PreInstall.sh file to /home/logrhythm and execute the script.

This cannot be run as sudo or the DX Installer will fail.
CODE
```
sh ./PreInstall.sh
```
If there are any changes in the plan file, you must copy the new plan file at /home/logrhythm/Soft.
Run the installer using the original or updated hosts file:
CODE
```
sudo sh LRDataIndexer-<version>.centos.x86_64.run --hosts <absolute path to .hosts file> --plan /home/logrhythm/plan.yml
```
Press Tab after starting to type out the installer name, and the filename autocompletes for you.
If prompted for the SSH password, enter the password for the logrhythm user.
The script installs or upgrades the Data Indexer on each of the DX machines.

This process may take up to 30 minutes.

When the installation or upgrade is complete, a confirmation message appears.
Check the status of services by typing sudo systemctl at the prompt, looking for “failed” services.

If the installation or upgrade fails with the error — failed to connect to the firewalld daemon — ensure that firewalld is running on all cluster nodes and start the installation again. To do this, log in to each node and run the following command: sudo systemctl start firewalld

Once the cluster restarts, there will be a short period of downtime as the DX update finalizes.

Validate the Linux Indexer Upgrade

To validate a successful upgrade of the Linux Indexer, check the following logs in /var/log/persistent:

ansible.log echoes console output from the upgrade, and should end with details about the number of components that upgraded successfully, as well as any issues (unreachable or failed)
logrhythm-node-install.sh.log lists all components that were installed or updated, along with current versions
logrhythm-cluster-install.sh.log should end with a message stating that the Indexer was successfully installed

Additionally, you can issue the following command and verify the installed version of various LogRhythm services, tools, and libraries, as well as third party tools:

CODE

sudo yum list installed | grep -i logrhythm

Verify that the following LogRhythm services are at the same version as the main installer version:
- Bulldozer
- Carpenter
- Columbo
- GoMaintain
- Transporter
- Watchtower
Verify that the following tools/libraries have been updated to the version matching the installer name:
- Cluster Health
- Conductor
- Persistent
- Silence
- Unique ID
- Upgrade Checker
Verify the following versions of these services and third party tools:
- elasticsearch 6.8.3

Configure the Data Indexer

Configuring the Data Indexer for Windows and Linux has moved from the individual clusters, to the Configuration Manager on the Platform Manager. You can configure all data Indexers using the LogRhythm Configuration Manager installed on the Platform Manager.

Cluster Name configuration is currently done through environment settings. Before configuring the Data Indexer on Windows, verify that the DX_ES_CLUSTER_NAME environment variable is set on both DR servers.
LogRhythm Service Registry, LogRhythm API Gateway and LogRhythm Windows Authentication API Service must be running before opening LogRhythm Configuration Manager
If you are configuring multiple data Indexers, all can be configured from the Primary PM as the configuration is centralized between servers.

In an MSSP environment, DX Cluster names are visible to all Users of a Web Console, regardless of Entity segregation. For privacy reasons, avoid using cluster names that could be used to identify clients. Data and data privacy are still maintained; only the cluster name is visible

Do not attempt to modify consul configurations manually. If you have any issues, contact LogRhythm Support.

To configure the Data Indexer:

Open the Configuration Manager from programs on the Platform Manager.
From the menu on the left, select the Data Indexers tab.
Each installed Data Indexer has its own section that looks like this:
Data Indexer - Cluster Name: <ClusterName> Cluster Id: <ClusterID>

The Cluster Name and Cluster ID come from the Environment variables, DX_ES_CLUSTER_NAME and DXCLUSTERID on each server. The Cluster Name can be modified in the Configuration Manager. If you change the Cluster Name, the name should be less than 50 characters long to ensure it displays properly in drop-down menus. The DXCLUSTERID is automatically set by the software and should not be modified.

Verify or update the following Data Indexer settings:

Do not modify any settings from their defaults unless you fully understand their impact. Modifying a setting incorrectly can negatively impact Data Indexer function and performance.

Setting	Default	Description
Database User ID	LogRhythmNGLM	Username the DX services will use to connect to the EMDB database. When in FIPS mode, Windows authentication is required (local or domain). When using a domain account, the Database Username must be in domain\username format.
Database Password	<LogRhythm Default>	Password used by the DX services to connect to the EMDB database. It is highly recommended, and LogRhythm best practice, to change all MS SQL account passwords when setting up a deployment. After you change the LogRhythmNGLM password in Microsoft SQL Server Management Studio, you must set the Database Password to the same value. You should change the password in Microsoft SQL Server Management Studio first, then change it on the Data Indexer page.
GoMaintain ForceMerge	Disabled	Enables/Disables maintenance Force Merging. This can be left at the default value.
Integrated Security	Disabled	This should be enabled when FIPS is enabled on the operating system.

Click Show or Hide in Advanced View to toggle the view for Advanced Settings.

Advanced View Settings:

Setting:	Default	Description
Transporter Max Log Size (bytes)	1000000	Maximum log size in bytes that can be indexed. This can be left at the default value.
Transporter Web Server Port	16000	Port that the Transporter service listens on. This can be left at the default value.
Transporter Route Handler Timer (sec)	10	Indexing log batch timeout setting. This can be left at the default value.
Elasticsearch Data Path	Windows: D:\LRIndexer\data Linux:/usr/local/logrhythm/db/data	Path where Data Indexer data will be stored. The path will be created if it does not already exist. Modifying this path after the Data Indexer installed will not move indices, they must be manually moved if the path is changed.
GoMaintain TTL Logs (#indices)	-1	Number of indices kept by the DX. This should be left at the default value.
GoMaintain IndexManage Elasticsearch Sample Interval (sec)	10	Number of seconds between resource usage samples. This can be left at the default value.
GoMaintain Elasticsearch Samples (#Samples)	60	Total number of samples taken, before GoMaintain decides to take action, when resource HWMs are reached.
GoMaintain IndexManager Disk HWM (%diskutil)	80	Maximum percentage of the disk for the Drive where the data path is configured. This can be left at the default value.
GoMaintain IndexManage Elasticsearch Heap HWM (%esheap)	85	Maximum % Heap used percentage before GoMaintain closes an index to release resources. This can be left at the default value.
Carpenter SQL Paging Size (#records)	10000	Number of records to pull from EMDB at one time when syncing EMDB indices. This can be left at the default value.
Carpenter EMDB Sync Interval (#minutes)	5	Interval of how often Carpenter service will sync EMDB indices. This can be left at the default value.
Enable Warm Replicas	Disabled	Turn replicas on for Warm Indices. This setting will only affect Linux Data Indexer clusters that contain warm nodes. This can be left at the default value.

Click Submit.

Automatic Maintenance

Automatic maintenance is governed by several of the above settings by the GoMaintain service. On startup, GoMaintain will continuously take samples from Elasticsearch stats, including disk and heap utilization for the configured time frame.

GoMaintain will automatically perform maintenance when High Water Mark settings are reached. Samples are taken over a period of time and analyzed before GoMaintain will take action on an index. This will depend on the Sample Interval and #Sample settings. By default, this is 60 samples, 1 every 10 seconds for a total of 10 minutes. If it is determined during that sample period that a High Water Mark setting was reached for an extended period of time, indices will be closed, deleted, or moved to warm nodes depending on the data indexer configuration. After an action is taken and completed, the sample period will begin again.

The DX monitors Elasticsearch memory and DX storage capacity. GoMaintain tracks heap pressure on the nodes. If the pressure constantly crosses the threshold, GoMaintain decreases the number of days of indices by closing the index. Closing the index removes the resource needs of managing that data and relieves the heap pressure on Elasticsearch. GoMaintain continues to close days until the memory is under the warning threshold, and continues to delete days based on the default disk utilization setting of 80%.

Logging of configuration and results for force merge can be found in C:\Program Files\LogRhythm\DataIndexer\logs\GoMaintain.log.

GoMaintain TTL Logs (#Indices)

The default configuration value is -1. This value monitors the systems resources and automatically manages the time-to-live (TTL). You can configure a lower TTL by changing this number. If this number is no longer achievable, the Data Indexer sends a diagnostic warning and starts closing the indices. Indices that have been closed by GoMaintain are not actively searchable after 7.9.x, but are maintained for reference purposes.

To show closed indices, run a curl command such as:

curl -s -XGET 'http://localhost:9200/_cat/indices?h=status,index' | awk '$1 == "close" {print $2}'

To show both open and closed indices, open a browser to http://localhost:9200/_cat/indices?v.

Indices can be reopened with the following query, as long as you have enough heap memory and disk space to support this index. If you do not, it immediately closes again.

curl -XPOST 'localhost:9200/<index>/_open?pretty'

After you open the index in this way, you can investigate the data in either the Web Console or Client Console.

Disk Utilization Limit

IndexManager Disk HWM (%diskUtil) Indicates the percentage of disk utilization that triggers maintenance. The default is 80, which means that maintenance starts when the Elasticsearch data disk is 80% full.
If Warm nodes are present, the disk utilization for combined Hot and Warm nodes will be tracked separately.

The value for %diskUtil should not be set higher than 80. This can have an impact on the ability of Elasticsearch to store replica shards for the purpose of failover.

If Warm nodes are present, the oldest index will be moved to the Warm node(s) if the Disk HWM is reached.

Maintenance is applied to the active repository, as well as archive repositories created by Second Look. When the Disk Usage Limit is reached, active logs are trimmed when “max indices” is reached. At this point, GoMaintain deletes completed restored repositories starting with the oldest date.

The default settings prioritize restored repositories above the active log repository. Restored archived logs are maintained while sacrificing active logs. If you want to keep your active logs and delete archives for space, set your min indices equal to your max indices. This forces the maintenance process to delete restored repositories first.

Heap Utilization Limit

IndexManager Heap HWM (%esheap) Indicates the percentage of Elasticsearch (java) heap utilization that triggers maintenance. The default is 85, which means that maintenance starts when the Elasticsearch heap utilization reaches 85%.

The value for %esheap should not be set higher than 85. This can have an impact on the ability of Elasticsearch searches and indexing and can degrade overall Elasticsearch performance.

If the Heap HWM is reached, GoMaintain will automatically close the oldest index in the cluster to release memory resources used by the cluster. If warm nodes are present in the cluster, the index will automatically be moved to the warm nodes before the index is closed.

Closed Indices on Hot nodes cannot be searched and will remain in a closed state on the data indexer until the Utilization Limit is reached.

Force Merge Configuration

Do not modify any of the configuration options under Force Merge Config without the assistance of LogRhythm Support or Professional Services.

The force merge configuration combines index segments to improve search performance. In larger deployments, search performance can degrade over time due to a large number of segments. Force merge can alleviate this issue by optimizing older indices and reducing heap usage.

Enabling Force Merge will show these additional ForceMerge Settings:

Parameter	Default
GoMaintain ForceMerge Hour (UTC Hour of day)	The hour of the day, in UTC, when the merge operation should begin. If Only Merge Periodically is set to false, GoMaintain merges segments continuously, and this setting is not used.
GoMaintain Forcemerge Days to Exclude	ForceMerging will take place only on indices excluding the first X indices, moving backwards in time.
Only Merge Periodically	If set to true, Go Maintain only merges segments once per day, at the hour specified by Hour Of Day For Periodic Merge. If set to false, GoMaintain merges segments on a continuous basis.

Information About Automatic Maintenance

Automatic maintenance is governed by several settings in GoMaintain Config:

Disk Utilization Limit

Disk Util Limit. Indicates the percentage of disk utilization that triggers maintenance. The default is 80, which means that maintenance starts when the Elasticsearch data disk is 80% full.

The value for Disk Util Limit should not be set higher than 80. This can have an impact on the ability of Elasticsearch to store replica shards for the purpose of failover.

Maintenance is applied to the active repository, as well as archive repositories created by Second Look. When the Disk Usage Limit is reached, active logs are trimmed when “max indices” is reached. At this point, Go Maintain deletes completed restored repositories starting with the oldest date.

The default settings prioritize restored repositories above the active log repository. Restored archived logs are maintained at the sacrifice of active logs. If you want to keep your active logs and delete archives for space, set your min indices equal to your max indices. This forces the maintenance process to delete restored repositories first.

Force Merge Config

Do not modify any of the configuration options under Force Merge Config without the assistance of LogRhythm Support or Professional Services.

The force merge configuration combines index segments to improve search performance. In larger deployments, search performance could degrade over time due to a large number of segments. Force merge can alleviate this issue by optimizing older indices and reducing heap usage.

Parameter	Default	Value
Hour Of Day For Periodic Merge	The hour of the day, in UTC, when the merge operation should begin. If Only Merge Periodically is set to false, Go Maintain merges segments continuously, and this setting is not used.	1
Merging Enabled	If set to true, merging is enabled. If set to false, merging is disabled.	false
Only Merge Periodically	If set to true, Go Maintain only merges segments once per day, at the hour specified by Hour Of Day For Periodic Merge. If set to false, Go Maintain merges segments on a continuous basis.	false

Logging of configuration and results for force merge can be found in C:\Program Files\LogRhythm\DataIndexer\logs\GoMaintain.log.

Index Configs

The DX monitors Elasticsearch memory and DX storage capacity. GoMaintain tracks heap pressure on the nodes. If the pressure constantly crosses the threshold, GoMaintain decreases the number of days of indices by closing the index. Closing the index removes the resource needs of managing that data and relieves the heap pressure on Elasticsearch. GoMaintain continues to close days until the memory is under the warning threshold and continues to delete days based on the disk utilization setting of 80% by default.

The default config is -1. This value monitors the systems resources and automanages the time-to-live (TTL). You can configure a lower TTL by changing this number. If this number is no longer achievable, the DX sends a diagnostic warning and starts closing the indices.

Indices that have been closed by GoMaintain are not actively searchable in 7.12 but are maintained for reference purposes. To see which indices are closed, you can run a curl command such as the following:

curl -s -XGET 'http://localhost:9200/_cat/indices?h=status,index' | awk '$1 == "close" {print $2}'

You can also open a browser to http://localhost:9200/_cat/indices?v to show both open and closed indices.

Indices can be reopened with the following query as long as you have enough heap memory and disk space to support this index. If you do not, it immediately closes again.

curl -XPOST 'localhost:9200/<index>/_open?pretty'

After you open the index in this way, you can investigate the data in either the Web Console or Client Console.