Disaster Recovery Administration Checklist
This checklist can be used to record your progress throughout the process of administering a LogRhythm Disaster Recovery deployment.
Regular Monitoring Tasks
Replication Status Monitoring
[ ] Check replication status using LogRhythm DR Control:
[ ] Run DR Control (Start > All Programs > LogRhythm > Disaster Recovery > DR Control) as administrator
[ ] Verify databases show "Synchronized" or "Synchronizing" status
[ ] Review metrics (SendQueue, SendRate, RedoQueue, RedoRate, EstimatedRecoveryTime, SyncPerformance)
[ ] Exit panel with 'Q'
[ ] Alternatively, use AlwaysOn Availability Group Dashboard:
[ ] Start SQL Server Management Studio and log in as administrator
[ ] Expand AlwaysOn High Availability folder and Availability Groups folder
[ ] Right-click Availability Group and select "Show Dashboard"
Replication Mode Management
[ ] Review current replication mode (Asynchronous or Synchronous)
[ ] Determine if mode changes are needed based on:
[ ] Current network performance
[ ] Recovery Point Objective (RPO) requirements
[ ] Performance requirements
[ ] Distance between Primary and Secondary sites
Planned Failover Procedure
Pre-Failover Steps
[ ] Schedule maintenance window for failover
[ ] Notify all relevant stakeholders of planned failover
[ ] Verify all databases are synchronized between Primary and Secondary sites
[ ] Verify Secondary site components are ready to become active
Execute Planned Failover
[ ] Access Primary (active) Platform Manager
[ ] Run DR Control (Start > All Programs > LogRhythm > Disaster Recovery > DR Control) as administrator
[ ] Press 'D' to display DR Control Options
[ ] Type 'F' to initiate failover process
[ ] Confirm with 'Y' when prompted
[ ] Wait for automatic tasks to complete:
[ ] Platform Manager services stopping on Primary site
[ ] Database synchronization verification
[ ] Secondary Platform Manager designation as Active site
Post-Failover Steps
[ ] Verify DNS record updates (automatic or manual) to point to Secondary Platform Manager
[ ] Wait for TTL limit to be reached
[ ] Confirm Platform Manager services have started on Secondary site:
[ ] Alarming and Response Manager (ARM) service
[ ] Job Manager service
[ ] Start services for Data Processors, Data Indexers, and AI Engines if necessary
[ ] Verify remote systems reconnection to Secondary Platform Manager
[ ] Test system functionality on Secondary site
[ ] Document failover completion
Unplanned Failover Procedure (DR Only)
Execute Unplanned Failover
[ ] Go to Secondary (standby) Platform Manager
[ ] Run DR Control (Start > All Programs > LogRhythm > Disaster Recovery > DR Control) as administrator
[ ] Acknowledge potential data loss warning by typing 'Y'
[ ] Wait for automatic tasks to complete:
[ ] Secondary Platform Manager switching to Active state
[ ] Platform Manager services starting on Secondary site
[ ] Replicated databases loading
[ ] Press Enter to exit when failover is complete
Post-Failover Steps
[ ] Update DNS record to point to Secondary Platform Manager
[ ] Wait for TTL limit to be reached
[ ] Reconnect remote systems to Secondary Platform Manager
[ ] Redirect Agents to new Data Processor if necessary
[ ] Test system functionality on Secondary site
[ ] Document failover completion and any data loss
Unplanned Failover Procedure (HA + DR)
Execute Unplanned Failover
[ ] Go to Secondary (standby) Platform Manager
[ ] Run DR Control as administrator
[ ] Type 'D' to display DR Control Options
[ ] Type 'F' to initiate failover
[ ] Acknowledge potential data loss warning by typing 'Y'
[ ] Wait for automatic tasks to complete
[ ] Exit when failover is complete
Post-Failover Steps
[ ] Update DNS record to point to Secondary Platform Manager
[ ] Wait for TTL limit to be reached
[ ] Reconnect remote systems to Secondary Platform Manager
[ ] Redirect Agents if Data Processor is unavailable
[ ] Test system functionality on Secondary site
[ ] Document failover completion and any data loss
Failback Procedure (Resume Operations on Primary)
Pre-Failback Verification
[ ] Verify Primary Platform Manager is operational
[ ] Run DR Control on Primary Platform Manager as administrator
[ ] Verify State column displays "Suspended" (ready for data replication)
Execute Failback
[ ] Type 'D' to display DR Control Options
[ ] Type 'R' to resume data replication
[ ] Wait for all databases to show "Synchronized" state
[ ] Open DR Control on Primary Platform Manager
[ ] Type 'D' to display DR Control Options
[ ] Type 'F' to fail over to Primary site
[ ] Confirm with 'Y' when prompted
[ ] Wait for automatic tasks to complete
Post-Failback Steps
[ ] Update DNS record to point to Primary Platform Manager
[ ] Wait for TTL limit to be reached
[ ] Verify all systems reconnect to Primary Platform Manager
[ ] Test system functionality on Primary site
[ ] Document failback completion
IP Address Changes (Re-IP Procedure)
Preparation
[ ] Document current IP configuration:
[ ] Management IPs
[ ] Failover IPs
[ ] Replication IPs
[ ] Cluster (DNS) name
[ ] Replication ports
[ ] Plan new IP configuration
[ ] Schedule maintenance window for changes
Execute Re-IP
[ ] From DR Install folder, run DR Re-IP Uninstall.exe as Administrator
[ ] Click the "Re-IP" tab
[ ] Enter new IP addresses for the deployment
[ ] Validate IPs and DNS name
[ ] Click "Re-IP" to run the script
[ ] Review script output to verify success
Post Re-IP Verification
[ ] Test replication status using DR Control
[ ] Verify DNS resolution with new IP addresses
[ ] Test failover functionality with new configuration
[ ] Document IP changes
DR Uninstallation Procedure
Preparation
[ ] Document current configuration
[ ] Back up any critical data
[ ] Schedule maintenance window
[ ] Notify all relevant stakeholders
Execute Uninstallation
[ ] From DR Install folder on primary server, run DR Re-IP Uninstall.exe as Administrator
[ ] Click the "Uninstall" tab
[ ] Review description of uninstall process
[ ] Click "Uninstall" and follow confirmation prompts
[ ] Enter sysadmin-level SQL credentials when prompted
[ ] Review script output and address any errors
[ ] Repeat steps on secondary server
[ ] For secondary server, correctly identify deployment type when prompted
Post-Uninstallation Verification
[ ] Verify no databases are in Synchronizing, Not Synchronizing, Restoring, or Suspect state
[ ] Confirm LogRhythm folder in Windows Task Scheduler has been removed
[ ] Verify CONSUL_CLIENT environment variable does not exist on XM/PM
[ ] Confirm all LogRhythm PM services are running
[ ] Verify SQL job "LogRhythm DR Job Management" is gone and all remaining SQL Server agent jobs are enabled
[ ] Update components to use management IP instead of shared DNS or failover IPs
[ ] Re-run LRII and remove host record for Secondary server
[ ] In Deployment Properties, change "Does your deployment include Disaster Recovery (DR)?" to No
[ ] Run "Get-Cluster" from elevated PowerShell to verify cluster service is not running