Configuring Disaster Recovery For Persistent Volume Claims
To achieve cross-cluster disaster recovery for Application PVCs, use Alauda Build of VolSync.
TOC
OverviewTerminologyPrerequisitesDeploy Alauda Build of VolSyncConfiguring a Scheduled SynchronizationConfiguring a One-Time SynchronizationEnable Disaster Recovery for the Application PVCPlanned MigrationProceduresFailoverProceduresFailback (post-disaster recovery)ProceduresOverview
Alauda Build of VolSync is an operator that performs asynchronous replication of persistent volumes within or across clusters. The replication provided by VolSync is independent of the storage system. This allows replication to and from storage types that don't normally support remote replication. Additionally, it can replicate across different types (and vendors) of storage.
Terminology
Prerequisites
- Download the Alauda Build of VolSync installation package corresponding to your platform architecture.
- Upload the Alauda Build of VolSync installation package using the Upload Packages mechanism to both Primary and Secondary clusters.
- Alauda Container Platform Snapshot Management has been deployed on both Primary and Secondary cluster.
- The storage used by the PVC must be provisioned by the CSI and support snapshot functionality.
Deploy Alauda Build of VolSync
-
Login, go to the Administrator page.
-
Click Marketplace > OperatorHub to enter the OperatorHub page.
-
Find the Alauda Build of VolSync, click Install, and navigate to the Install Alauda Build of VolSync page.
Configuration Parameters:
Configuring a Scheduled Synchronization
After configuring Scheduled Synchronization for a PVC, VolSync will automatically synchronize the data from the ReplicationSource to the ReplicationDestination at the specified interval.
This section outlines the configuration steps for synchronizing data from the primary cluster to the secondary cluster. For synchronization from the secondary to the primary, adapt the example below by swapping the cluster roles (primary and secondary)
Create a rsync-tls Data Mover Secret
Create the Secret on both Primary and Secondary clusters; skip this step if the Secret already exists.
Parameters:
Create ReplicationDestination Resource
Create ReplicationDestination on Secondary cluster
Parameters:
About service type
If ClusterIP is specified, the Service will receive an IP address allocated from the “cluster network” address pool. By default, this collection of addresses are not accessible from outside the cluster, making it a poor choice for cross-cluster replication. However, various networking addons such as Submariner bridge the cluster networks, making this a good option.
If LoadBalancer is specified, an externally accessible IP address will be allocated. This requires cluster support for load balancers such as those provided by the various cloud providers or MetalLB in the case of physical clusters. While this is the easiest method for allocating an accessible address in cloud environments, load balancers tend to incur additional costs and be limited in number.
Create ReplicationSource Resource
Create ReplicationSource on Primary cluster
Parameters:
Check Synchronization Status
Check synchronization from ReplicationSource
The last synchronization was completed at .status.lastSyncTime and took .status.lastSyncDuration seconds.
The next scheduled synchronization is at .status.nextSyncTime.
Configuring a One-Time Synchronization
One-Time Synchronization is initiated manually. This is controlled by setting a unique string for the manual field under the trigger specification in a ReplicationSource resource. The synchronization job runs once immediately upon applying the configuration.
Create One-Time ReplicationSource Resource
The only difference from Scheduled Synchronization is .spec.trigger should set to manual.
Check Synchronization Status
If the output matches <manual-id>, the synchronization is complete.
Enable Disaster Recovery for the Application PVC
Deploy stateful application
- Deploy stateful applications on Primary cluster
Click to view
-
Create application pvc on Secondary cluster
Configuring PVC Disaster Recovery
Set up Primary-to-Secondary Synchronization
refers to Configuring a Scheduled Synchronization
Planned Migration
User Scenario:
Relocate business services from the Primary cluster to the Secondary cluster while both clusters are operating normally.
Procedures
-
Scale down application pods
Scale down all the application pods which are using the dr PVC on the Primary cluster.
-
Delete ReplicationSource Resource
Delete
ReplicationSourceon Primary cluster -
Create One-Time Synchronization
Initiate a synchronization task from the Primary cluster to guarantee that the data in the Secondary cluster is
up-to-date.Create
ReplicationSourceon Primary clusterRefers to Configuring a One-Time Synchronization
-
Delete One-Time Synchronization
After One-Time synchronization completed, delete One-Time
ReplicationSourceresource -
Delete ReplicationDestination Resource
Delete
ReplicationDestinationon Secondary cluster -
Scale up application pods
Scale up all the application pods which are using the dr PVC on the Secondary cluster.
-
Set up secondary-to-primary Synchronization
Set up the
Secondary-to-Primarycluster synchronization for PVC disaster recovery by creating aReplicationDestinationon the Primary cluster and aReplicationSourceon the Secondary cluster.Refers to Configuring a Scheduled Synchronization
Failover
User Scenario:
Switching services to the Secondary cluster after Primary cluster abrupt shutdown.
Procedures
To ensure data integrity (in case the primary cluster experiences failures during synchronization), do a local synchronization on the Secondary cluster. Use the PVC restored from the last snapshot of the application's PVC as the source, and the application's current PVC as the destination to perform a data synchronization.
-
Restore PVC
Restore PVC from
ReplicationDestinationon Secondary cluster -
Create local ReplicationSource Resource
Create
ReplicationSourceResource on Secondary clusterParameters refers to Configuring a One-Time Synchronization
-
Waiting for synchronization to complete
If the output matches
<manual-id>, the synchronization is complete. -
Delete local ReplicationSource
Delete local
ReplicationSourceon Secondary cluster -
Delete ReplicationDestination
Delete
ReplicationDestinationon Secondary cluster -
Scale up application pods
Scale up all the application pods on Secondary cluster.
Failback (post-disaster recovery)
User Scenario:
The primary cluster has now been restored and is operational, necessitating a switchback of services to it.
Procedures
-
Scale down application pods on Primary cluster
When the primary cluster is back online, application pods will recover automatically. However, the service must first be scaled down to halt traffic. After synchronizing the latest data from the secondary cluster to the primary cluster, the application can then be scaled up to resume normal operation.
-
Delete ReplicationSource on Primary cluster
The
ReplicationSourcecreated before the Primary cluster failed needs to be deleted first. -
Syncing latest data from Secondary cluster
Set up a
Secondary-to-PrimaryOne-Time Synchronization.Create a
ReplicationDestinationon Primary cluster, and then create a one-timeReplicationSourceon Secondary clusterRefers to Configuring a One-Time Synchronization
-
Delete ReplicationDestination and ReplicationSource
After data synchronization, delete one-time resources
Delete
ReplicationSourceon Secondary clusterDelete
ReplicationDestinationon Primary cluster -
Migrate application
Scale down application pods on Secondary cluster
Scale up application pods on Primary cluster
-
Set up Primary-to-Secondary Synchronization
Refers to Configuring a Scheduled Synchronization