Disaster Recovery Administration Checklist

This checklist can be used to record your progress throughout the process of administering a LogRhythm Disaster Recovery deployment.

Regular Monitoring Tasks

Replication Status Monitoring

  • [ ] Check replication status using LogRhythm DR Control:

    • [ ] Run DR Control (Start > All Programs > LogRhythm > Disaster Recovery > DR Control) as administrator

    • [ ] Verify databases show "Synchronized" or "Synchronizing" status

    • [ ] Review metrics (SendQueue, SendRate, RedoQueue, RedoRate, EstimatedRecoveryTime, SyncPerformance)

    • [ ] Exit panel with 'Q'

  • [ ] Alternatively, use AlwaysOn Availability Group Dashboard:

    • [ ] Start SQL Server Management Studio and log in as administrator

    • [ ] Expand AlwaysOn High Availability folder and Availability Groups folder

    • [ ] Right-click Availability Group and select "Show Dashboard"

Replication Mode Management

  • [ ] Review current replication mode (Asynchronous or Synchronous)

  • [ ] Determine if mode changes are needed based on:

    • [ ] Current network performance

    • [ ] Recovery Point Objective (RPO) requirements

    • [ ] Performance requirements

    • [ ] Distance between Primary and Secondary sites

Planned Failover Procedure

Pre-Failover Steps

  • [ ] Schedule maintenance window for failover

  • [ ] Notify all relevant stakeholders of planned failover

  • [ ] Verify all databases are synchronized between Primary and Secondary sites

  • [ ] Verify Secondary site components are ready to become active

Execute Planned Failover

  • [ ] Access Primary (active) Platform Manager

  • [ ] Run DR Control (Start > All Programs > LogRhythm > Disaster Recovery > DR Control) as administrator

  • [ ] Press 'D' to display DR Control Options

  • [ ] Type 'F' to initiate failover process

  • [ ] Confirm with 'Y' when prompted

  • [ ] Wait for automatic tasks to complete:

    • [ ] Platform Manager services stopping on Primary site

    • [ ] Database synchronization verification

    • [ ] Secondary Platform Manager designation as Active site

Post-Failover Steps

  • [ ] Verify DNS record updates (automatic or manual) to point to Secondary Platform Manager

  • [ ] Wait for TTL limit to be reached

  • [ ] Confirm Platform Manager services have started on Secondary site:

    • [ ] Alarming and Response Manager (ARM) service

    • [ ] Job Manager service

  • [ ] Start services for Data Processors, Data Indexers, and AI Engines if necessary

  • [ ] Verify remote systems reconnection to Secondary Platform Manager

  • [ ] Test system functionality on Secondary site

  • [ ] Document failover completion

Unplanned Failover Procedure (DR Only)

Execute Unplanned Failover

  • [ ] Go to Secondary (standby) Platform Manager

  • [ ] Run DR Control (Start > All Programs > LogRhythm > Disaster Recovery > DR Control) as administrator

  • [ ] Acknowledge potential data loss warning by typing 'Y'

  • [ ] Wait for automatic tasks to complete:

    • [ ] Secondary Platform Manager switching to Active state

    • [ ] Platform Manager services starting on Secondary site

    • [ ] Replicated databases loading

  • [ ] Press Enter to exit when failover is complete

Post-Failover Steps

  • [ ] Update DNS record to point to Secondary Platform Manager

  • [ ] Wait for TTL limit to be reached

  • [ ] Reconnect remote systems to Secondary Platform Manager

  • [ ] Redirect Agents to new Data Processor if necessary

  • [ ] Test system functionality on Secondary site

  • [ ] Document failover completion and any data loss

Unplanned Failover Procedure (HA + DR)

Execute Unplanned Failover

  • [ ] Go to Secondary (standby) Platform Manager

  • [ ] Run DR Control as administrator

  • [ ] Type 'D' to display DR Control Options

  • [ ] Type 'F' to initiate failover

  • [ ] Acknowledge potential data loss warning by typing 'Y'

  • [ ] Wait for automatic tasks to complete

  • [ ] Exit when failover is complete

Post-Failover Steps

  • [ ] Update DNS record to point to Secondary Platform Manager

  • [ ] Wait for TTL limit to be reached

  • [ ] Reconnect remote systems to Secondary Platform Manager

  • [ ] Redirect Agents if Data Processor is unavailable

  • [ ] Test system functionality on Secondary site

  • [ ] Document failover completion and any data loss

Failback Procedure (Resume Operations on Primary)

Pre-Failback Verification

  • [ ] Verify Primary Platform Manager is operational

  • [ ] Run DR Control on Primary Platform Manager as administrator

  • [ ] Verify State column displays "Suspended" (ready for data replication)

Execute Failback

  • [ ] Type 'D' to display DR Control Options

  • [ ] Type 'R' to resume data replication

  • [ ] Wait for all databases to show "Synchronized" state

  • [ ] Open DR Control on Primary Platform Manager

  • [ ] Type 'D' to display DR Control Options

  • [ ] Type 'F' to fail over to Primary site

  • [ ] Confirm with 'Y' when prompted

  • [ ] Wait for automatic tasks to complete

Post-Failback Steps

  • [ ] Update DNS record to point to Primary Platform Manager

  • [ ] Wait for TTL limit to be reached

  • [ ] Verify all systems reconnect to Primary Platform Manager

  • [ ] Test system functionality on Primary site

  • [ ] Document failback completion

IP Address Changes (Re-IP Procedure)

Preparation

  • [ ] Document current IP configuration:

    • [ ] Management IPs

    • [ ] Failover IPs

    • [ ] Replication IPs

    • [ ] Cluster (DNS) name

    • [ ] Replication ports

  • [ ] Plan new IP configuration

  • [ ] Schedule maintenance window for changes

Execute Re-IP

  • [ ] From DR Install folder, run DR Re-IP Uninstall.exe as Administrator

  • [ ] Click the "Re-IP" tab

  • [ ] Enter new IP addresses for the deployment

  • [ ] Validate IPs and DNS name

  • [ ] Click "Re-IP" to run the script

  • [ ] Review script output to verify success

Post Re-IP Verification

  • [ ] Test replication status using DR Control

  • [ ] Verify DNS resolution with new IP addresses

  • [ ] Test failover functionality with new configuration

  • [ ] Document IP changes

DR Uninstallation Procedure

Preparation

  • [ ] Document current configuration

  • [ ] Back up any critical data

  • [ ] Schedule maintenance window

  • [ ] Notify all relevant stakeholders

Execute Uninstallation

  • [ ] From DR Install folder on primary server, run DR Re-IP Uninstall.exe as Administrator

  • [ ] Click the "Uninstall" tab

  • [ ] Review description of uninstall process

  • [ ] Click "Uninstall" and follow confirmation prompts

  • [ ] Enter sysadmin-level SQL credentials when prompted

  • [ ] Review script output and address any errors

  • [ ] Repeat steps on secondary server

  • [ ] For secondary server, correctly identify deployment type when prompted

Post-Uninstallation Verification

  • [ ] Verify no databases are in Synchronizing, Not Synchronizing, Restoring, or Suspect state

  • [ ] Confirm LogRhythm folder in Windows Task Scheduler has been removed

  • [ ] Verify CONSUL_CLIENT environment variable does not exist on XM/PM

  • [ ] Confirm all LogRhythm PM services are running

  • [ ] Verify SQL job "LogRhythm DR Job Management" is gone and all remaining SQL Server agent jobs are enabled

  • [ ] Update components to use management IP instead of shared DNS or failover IPs

  • [ ] Re-run LRII and remove host record for Secondary server

  • [ ] In Deployment Properties, change "Does your deployment include Disaster Recovery (DR)?" to No

  • [ ] Run "Get-Cluster" from elevated PowerShell to verify cluster service is not running