This checklist can be used to record your progress throughout the process of administering a LogRhythm Disaster Recovery deployment.
Regular Monitoring Tasks
Replication Status Monitoring
-
[ ] Check replication status using LogRhythm DR Control:
-
[ ] Run DR Control (Start > All Programs > LogRhythm > Disaster Recovery > DR Control) as administrator
-
[ ] Verify databases show "Synchronized" or "Synchronizing" status
-
[ ] Review metrics (SendQueue, SendRate, RedoQueue, RedoRate, EstimatedRecoveryTime, SyncPerformance)
-
[ ] Exit panel with 'Q'
-
-
[ ] Alternatively, use AlwaysOn Availability Group Dashboard:
-
[ ] Start SQL Server Management Studio and log in as administrator
-
[ ] Expand AlwaysOn High Availability folder and Availability Groups folder
-
[ ] Right-click Availability Group and select "Show Dashboard"
-
Replication Mode Management
-
[ ] Review current replication mode (Asynchronous or Synchronous)
-
[ ] Determine if mode changes are needed based on:
-
[ ] Current network performance
-
[ ] Recovery Point Objective (RPO) requirements
-
[ ] Performance requirements
-
[ ] Distance between Primary and Secondary sites
-
Planned Failover Procedure
Pre-Failover Steps
-
[ ] Schedule maintenance window for failover
-
[ ] Notify all relevant stakeholders of planned failover
-
[ ] Verify all databases are synchronized between Primary and Secondary sites
-
[ ] Verify Secondary site components are ready to become active
Execute Planned Failover
-
[ ] Access Primary (active) Platform Manager
-
[ ] Run DR Control (Start > All Programs > LogRhythm > Disaster Recovery > DR Control) as administrator
-
[ ] Press 'D' to display DR Control Options
-
[ ] Type 'F' to initiate failover process
-
[ ] Confirm with 'Y' when prompted
-
[ ] Wait for automatic tasks to complete:
-
[ ] Platform Manager services stopping on Primary site
-
[ ] Database synchronization verification
-
[ ] Secondary Platform Manager designation as Active site
-
Post-Failover Steps
-
[ ] Verify DNS record updates (automatic or manual) to point to Secondary Platform Manager
-
[ ] Wait for TTL limit to be reached
-
[ ] Confirm Platform Manager services have started on Secondary site:
-
[ ] Alarming and Response Manager (ARM) service
-
[ ] Job Manager service
-
-
[ ] Start services for Data Processors, Data Indexers, and AI Engines if necessary
-
[ ] Verify remote systems reconnection to Secondary Platform Manager
-
[ ] Test system functionality on Secondary site
-
[ ] Document failover completion
Unplanned Failover Procedure (DR Only)
Execute Unplanned Failover
-
[ ] Go to Secondary (standby) Platform Manager
-
[ ] Run DR Control (Start > All Programs > LogRhythm > Disaster Recovery > DR Control) as administrator
-
[ ] Acknowledge potential data loss warning by typing 'Y'
-
[ ] Wait for automatic tasks to complete:
-
[ ] Secondary Platform Manager switching to Active state
-
[ ] Platform Manager services starting on Secondary site
-
[ ] Replicated databases loading
-
-
[ ] Press Enter to exit when failover is complete
Post-Failover Steps
-
[ ] Update DNS record to point to Secondary Platform Manager
-
[ ] Wait for TTL limit to be reached
-
[ ] Reconnect remote systems to Secondary Platform Manager
-
[ ] Redirect Agents to new Data Processor if necessary
-
[ ] Test system functionality on Secondary site
-
[ ] Document failover completion and any data loss
Unplanned Failover Procedure (HA + DR)
Execute Unplanned Failover
-
[ ] Go to Secondary (standby) Platform Manager
-
[ ] Run DR Control as administrator
-
[ ] Type 'D' to display DR Control Options
-
[ ] Type 'F' to initiate failover
-
[ ] Acknowledge potential data loss warning by typing 'Y'
-
[ ] Wait for automatic tasks to complete
-
[ ] Exit when failover is complete
Post-Failover Steps
-
[ ] Update DNS record to point to Secondary Platform Manager
-
[ ] Wait for TTL limit to be reached
-
[ ] Reconnect remote systems to Secondary Platform Manager
-
[ ] Redirect Agents if Data Processor is unavailable
-
[ ] Test system functionality on Secondary site
-
[ ] Document failover completion and any data loss
Failback Procedure (Resume Operations on Primary)
Pre-Failback Verification
-
[ ] Verify Primary Platform Manager is operational
-
[ ] Run DR Control on Primary Platform Manager as administrator
-
[ ] Verify State column displays "Suspended" (ready for data replication)
Execute Failback
-
[ ] Type 'D' to display DR Control Options
-
[ ] Type 'R' to resume data replication
-
[ ] Wait for all databases to show "Synchronized" state
-
[ ] Open DR Control on Primary Platform Manager
-
[ ] Type 'D' to display DR Control Options
-
[ ] Type 'F' to fail over to Primary site
-
[ ] Confirm with 'Y' when prompted
-
[ ] Wait for automatic tasks to complete
Post-Failback Steps
-
[ ] Update DNS record to point to Primary Platform Manager
-
[ ] Wait for TTL limit to be reached
-
[ ] Verify all systems reconnect to Primary Platform Manager
-
[ ] Test system functionality on Primary site
-
[ ] Document failback completion
IP Address Changes (Re-IP Procedure)
Preparation
-
[ ] Document current IP configuration:
-
[ ] Management IPs
-
[ ] Failover IPs
-
[ ] Replication IPs
-
[ ] Cluster (DNS) name
-
[ ] Replication ports
-
-
[ ] Plan new IP configuration
-
[ ] Schedule maintenance window for changes
Execute Re-IP
-
[ ] From DR Install folder, run DR Re-IP Uninstall.exe as Administrator
-
[ ] Click the "Re-IP" tab
-
[ ] Enter new IP addresses for the deployment
-
[ ] Validate IPs and DNS name
-
[ ] Click "Re-IP" to run the script
-
[ ] Review script output to verify success
Post Re-IP Verification
-
[ ] Test replication status using DR Control
-
[ ] Verify DNS resolution with new IP addresses
-
[ ] Test failover functionality with new configuration
-
[ ] Document IP changes
DR Uninstallation Procedure
Preparation
-
[ ] Document current configuration
-
[ ] Back up any critical data
-
[ ] Schedule maintenance window
-
[ ] Notify all relevant stakeholders
Execute Uninstallation
-
[ ] From DR Install folder on primary server, run DR Re-IP Uninstall.exe as Administrator
-
[ ] Click the "Uninstall" tab
-
[ ] Review description of uninstall process
-
[ ] Click "Uninstall" and follow confirmation prompts
-
[ ] Enter sysadmin-level SQL credentials when prompted
-
[ ] Review script output and address any errors
-
[ ] Repeat steps on secondary server
-
[ ] For secondary server, correctly identify deployment type when prompted
Post-Uninstallation Verification
-
[ ] Verify no databases are in Synchronizing, Not Synchronizing, Restoring, or Suspect state
-
[ ] Confirm LogRhythm folder in Windows Task Scheduler has been removed
-
[ ] Verify CONSUL_CLIENT environment variable does not exist on XM/PM
-
[ ] Confirm all LogRhythm PM services are running
-
[ ] Verify SQL job "LogRhythm DR Job Management" is gone and all remaining SQL Server agent jobs are enabled
-
[ ] Update components to use management IP instead of shared DNS or failover IPs
-
[ ] Re-run LRII and remove host record for Secondary server
-
[ ] In Deployment Properties, change "Does your deployment include Disaster Recovery (DR)?" to No
-
[ ] Run "Get-Cluster" from elevated PowerShell to verify cluster service is not running