Administer a LogRhythm Disaster Recovery Deployment

This guide describes how to use the LogRhythm Disaster Recovery (DR) solution, which protects businesses from potential site failures due to catastrophic events (power outage, fire, flood, etc.). After the DR solution is configured on Primary and Secondary sites in the network, you can use the DR solution to perform the following tasks:

Check the replication status
Change the replication mode (synchronous or asynchronous)
Perform a failover to the Secondary site and a failback to the Primary site

Prerequisites

This guide assumes that LogRhythm Professional Services already installed and configured the DR solution on a Primary and Secondary site, as described in Install a LogRhythm Disaster Recovery Deployment.

Audience

This guide is for Enterprise customers who administer the DR solution and are responsible for system failovers and failbacks in case of a disaster or a planned outage.

Definition of Disaster Recovery Terms and Concepts

Term	Definition
Disaster recovery sites	In the DR solution, you configure two types of sites: Primary. The main site that collects and processes log data. Secondary. A mirrored site that can be running and available if the Primary site fails.
System status	A Primary or Secondary site can be either: Active. The site that is actively collecting and processing log data. Usually, the Primary site is active, unless you have failed over to the Secondary site. Standby. A system that is available to collect and process log data.
Site types	Disaster recovery sites can be characterized as follows: Cold. An available site, but may not be powered up or configured. Warm. A site with configured systems, but must wait until it receives the Primary site’s backup files before it can be active. Hot. A mirrored site that can be active within minutes. The Secondary site in the LogRhythm DR solution is considered a “hot site.”
Recovery Objectives	Disaster recovery solutions are typically characterized by two recovery objectives: Recovery Point Objective (RPO). The maximum tolerable period in which data from the Primary site can be lost due to system disruption. Recovery Time Objective (RTO). The maximum duration that a site can be unavailable before it must be restored to avoid unacceptable business consequences. Each organization must determine the impact on their own applications and systems to calculate an acceptable RPO and RTO.
Transmission methods	In the DR solution, you can select one of the following transmission methods for the replicated data: Synchronous. Data is committed to the Secondary site before the Primary site acknowledges the transfer. The speed of the Primary instance can only be as fast as the link between the two sites. Asynchronous (default, recommended). Data transfer is acknowledged on the Primary site before being sent to the Secondary site. This method provides the best possible performance for peak times, while the slower periods allow the synchronization to catch up if it falls behind.
Failover solutions	A “failover” is the process of enabling another site when the Primary site fails, in two different situations: Unplanned failover. Occurs because of a natural disaster or unforeseen event. Planned failover. Occurs because administrators scheduled the switch.

Operation Requirements

After the DR solution is configured, make sure that:

Platform Managers must be bound to the same Active Directory Domain.
The Primary and Secondary site must be able to access a Microsoft DNS server within the Active Directory Domain.
The Primary and Secondary site should be resolvable via DNS — must have a Forward (A-record) and Reverse (PTR) record.
The DNS server must be configured to allow Secure or Insecure updates from clients.
The DR Setup utility must be run by an Active Directory user with local administrative rights.
To create a Failover Cluster, an additional IP address is required on each node participating in the cluster. This IP is used for cluster creation, Failover Clustering node communication, and for providing an IP address to use for providing LogRhythm services. Failover IP addresses should be unused IP addresses on the network. In a multi-subnet scenario, two distinct, unused IP addresses are needed in DR Setup, one in each respective subnet. In a single-subnet DR scenario, only one unused IP address is needed for the Failover IP — it will be the same for Primary and Secondary. The Failover IP should be presented on the network adapter that has access to Active Directory in order to update the accompanying Cluster DNS record. This IP address is a virtualized IP address that the underlying Windows Server Failover Cluster will use for facilitating cluster communications.
LogRhythm versions on the Primary and Secondary site are on same. If you upgrade the components on the Primary site, you must upgrade them on the Secondary site.
If the Enable Password Policy option is disabled on the LogRhythm SIEM user account or the SA and LRMirror_Login, passwords will not synchronize between nodes. If Enable Password Policy is enforced, the passwords must be changed manually on the Secondary Node whenever they are changed on the primary. The Enable Password Policy option can be disabled by modifying the user account login on the People tab in the Client Console.
The service for SQL Server on the Primary site is configured to run under the same account as the SQL Server service on the Secondary site. This must be an Active Directory account with local administrative privileges.
The SQL Server Agent service on the Primary site is configured to run under the same account as the SQL Agent service on the Secondary site. This should be a named, privileged account that is not the sa account, and must be a domain account.
A common DNS record is configured so that it can point to either the IP address of the Primary Platform Manager or the IP address of the Secondary Platform Manager.
The DNS zone spans the Primary and Secondary sites.
Firewall requirements are listed in the table below. If network firewalls or Group Policy settings prevent this communication, the DR installation will fail. During installation, the DR Setup tool will configure these ports to only allow system-to-system communication.
Application Protocol Ports
Cluster Service TCP 3343
Cluster Service UDP 3343
RPC TCP 135
Cluster Administrator UDP 137
Ephemeral Ports UDP 1024-65535
SQL Replication TCP 5022 (default)
MS SQL TCP 1433

In some cases, when an admin user creates a non-domain user on the active server, the following error is displayed: User does not have permission to perform this action. The next time login replication occurs, however, the user is created. This error is known to occur in an XM setup in which the DR Mediator is active.

Application	Protocol	Ports
Cluster Service	TCP	3343
Cluster Service	UDP	3343
RPC	TCP	135
Cluster Administrator	UDP	137
Ephemeral Ports	UDP	1024-65535
SQL Replication	TCP	5022 (default)
MS SQL	TCP	1433