Storage Replica

Overview

Storage Replica is a software implementation of volume replication technology, new with Windows 2016, that is designed for disaster recovery. It protects against hardware failure by exactly duplicating a volume block by block, and will also protect against site failure if it is used on stretched clusters that span two sites. Like any other replication technology, Storage Replica does not eliminate the need to take backups, as data corruption or user errors would be replicated. If you are not familiar with replication technology then maybe we should spell that out. Replication keeps two copies of a disk in exact synchronisation. So, if you rely on replication for backups and you accidentally format and wipe a volume, Storage Replica will oblige and format and wipe the replica too and you will have lost all your data. No backup, no data, lost forever. Take backups!
Storage Replica is also not intended to provide a second copy of data that can be updated, either locally or at a remote site as that would not be a valid disaster recovery copy.
While there are many hardware replication solutions on the market, they all require the same type of hardware at both sides. Because Storage Replica is a software implementation, it is storage-agnostic and supports unlike hardware.

Storage Replica supports both synchronous and asynchronous replication

Synchronous Replication

When an application writes data out to storage, it waits until it gets confirmation that the write succeeded before it continues processing. With synchronous replication, the applicaton waits until it gets write confirmation from both the local and the remote site. If you want to provide zero data loss disaster recovery, your second disk needs to be several kilometers away from the primary disk, so to prevent the replication from affecting application performance, you need to use fast networks and fast disk subsystems. So why would you want to use synchronous replication? Well if you are working on financial systems like a bank, it should be obvious that it is essential that no financial transaction data be lost in a disaster. What about an airline booking site? Imagine the fuss if you had a disaster and lost 12 hours worth of bookings, so your customers have paid for their flights, but you have no record of them.

One of the features of replication is that the destination volume is not accessible while replicating. I've seen people complain about this on blogs, but for one thing, data is replicated at block level, so if you could update at file level, you would corrupt the data. For another thing, this data is for disaster recovery. If you start updating the target disk, then it is no longer a valid DR copy. The destination volume will be dismounted when repliction is configured, and while it is possible that its drive letter may be visible in Explorer, you will not be able to access the volume itself.

Asyncronous Replication

Asynchronous replication simply means that the application will consider a write is complete when the data is safely stored on the source disks. Data is then written out to the remote disks later, without slowing down the application. This is quite adequate for many applications and is usually cheaper to implement than synchronous replication. Some implementations use snapshots for asynchronous replication, but Storage Replica implements asynchronous exactly like sychronous replication, without the need to acknowledge the write at the destination disk. There is no guarantee that both sites have identical copies of the data at the time of a failure, but it will work over slower networks and longer distances than synchronous replication.

Storage Replica terminology

I've mentioned a few terms above without really explaining what they mean.

  • The Local Site or Server is the one that normally runs your applications
  • The Target or Remote site is the DR site, which might just be another local server, or it may be a second, remote data center
  • The Source or Primary disk contains the local copy of data on the active server. Local writes are permitted to this disk, and it replicates the data out
  • The Destination, Secondary or Remote disk receives the replicated data from the source disk. It does not allow local writes
  • A source and destination computer that have a synchronisation relationship between them is called a Replication Partnership. The replicated data is all initially written to a single log at the remote site. The remote log must be on a different disk to the replicated data.

Supported configurations

With Windows Server 2016 Datacenter Edition, you can deploy storage replication in a stretch cluster, between cluster-to-cluster, and in server-to-server configurations.

  • A Stretched Cluster runs between two sites and can give you automated application failover if one site fails, because both the servers and storage are managed by the Cluster. You can use Storage Spaces on a stretched cluster, with shared SAS storage, SAN and iSCSI-attached LUNs. The storage is replicated between the two sites, usually synchronously, to provide almost seamless failover.
  • A cluster-to-cluster setup permits replication between two separate clusters, with one cluster replicating another, either synchronously or asynchronously. Cluster to Cluster can also use Storage Spaces Direct, but application failover is not automatic, instead it requires manual intervention.
  • Server to server allows synchronous and asynchronous replication between two standalone servers. It can also use Storage Spaces, and also requires manual intervention for failover.

If you want to test DR, or do site maintenance then you can switch the replication direction, so your DR site becomes the primary. However you must wait until the initial sync is complete before trying this.
Storage Replica uses consistency groups, where volumes can be grouped together and managed as an entity. For example, if you are replicating SQL databases that span multiple volumes, then it is essential that the replicated writes are sent out in the same order, otherwise the replicated database could be corrupt if a disaster happens. If the relevant volumes are in a consistency group, then Replica will write out the data to the destination server in the correct order.

back to top