One of the advantages of vSphere is that you can move a virtual machine from one location to another, across servers, storage locations-even data centers. Physical servers don't have that ability and that can have many implications for disaster recovery, availability, and so forth. This white paper explains why migrations are useful, the methods that vSphere makes available for you to manually move a virtual machine (VM), and how vSphere can automate the process for you in various scenarios.
Semi-Automatic Methods between Sites
The methods described in the Automatic Methods section work great within a site and are more or less fully automated based on the policies set by the administrator. They work well and should be implemented.
But what about Business Continuity (BC) and Disaster Recovery (DR)? You probably don't want them fully automated. You don't want VMs to move on their own, take outages, etc. without someone making a conscious choice to do so. That's where vSphere Replication (VR) and Site Recovery Manager (SRM) come in. They require some administrator action to initiate.
vSphere Replication (VR)
vSphere Replication (VR) has been included with vSphere since 5.0, though the interface to manage it outside of SRM didn't get introduced until 5.1. vSphere 5.5 added many new features, including:
-The ability to make Multiple Point-In-Time (MPIT) copies of a VM, i.e., to keep copies of up to 24 previous points in times that can be reverted to if the most recent one has an issue, such as a virus or data corruption. You specify how many copies to keep per day and how many days to keep them.
- Network improvements that make the copies faster (from two to 100 times faster).
- Support for VSAN as the datastore on either side.
VR is included in all editions of vSphere (Essentials Plus and higher) for free.
VR is configured on a per VM basis, specifying where to replicate each VM (site, cluster, and datastore) and the Recovery Point Objective (RPO) (worst case, how much data can be lost). The RPO can be specified from fifteen minutes to twenty-four hours. Note that you don't tell VR when to replicate-it decides based on volume of data, other VMs that need to be replicated, available bandwidth, etc., how often to replicate to meet the configured RPO. In addition, you can have VR use VSS in Windows to quiesce the applications to get application consistent points in time, not just file system consistent points.
After the initial full replication (over the network or by restoring a backup for example), only the changes since the last replication will be replicated to minimize both the time and bandwidth required to replicate the VM. The system saves the changes on the destination side in a redo log, which is automatically applied once the replication successfully completes (or can be rolled back in the event of a failure), ensuring that data is always in a consistent state.
Multiple replication topologies are possible including unidirectional from a protected to a DR site, bidirectional between two sites (for different VMs, each VM is unidirectional), distributed sites to a central site (e.g., branch office to headquarters), and central site to branch offices.
To failover a VM to the remote site, you run a wizard and it will bring the VM online; this needs to be done individually for each VM to relocate. Once online, to revert to a previous point in time, use the standard Snapshot Manager to select the desired point. Site Recovery Manager (SRM)
VR is a great product, but it requires administrator time and effort to setup and to failover each VM. The administrator needs to know what VMs to failover in what order to ensure applications start and run correctly. After a failover, an administrator must stop replication, set it up in the opposite direction, failover back to the original site, stop replication again, and finally reconfigure replication for the next disaster. All of this can be time consuming and doesn't work well if administrators are not available at the remote site to bring everything online again.
Site Recovery Manager (SRM) solves those problems by automating the entire process. This allows groups of VMs to be protected together, failing them over as a group in just a few clicks, and automating the process of reversing the replication direction after the disaster has passed.