Windows 2016 Storage Replica and ReFS Volumes

0 Flares Twitter 0 Facebook 0 Google+ 0 LinkedIn 0 Email -- 0 Flares ×
One customer asked me recently how to possibly replicate a ReFS volume so that they can have two different locations for the Veeam backup files. The reason is because regular file copy operations, that would be used in other situations, do not maintain ReFS BlockCloning savings, thus a file copied from a ReFS repository to another one, will be completely re-hydrated. If the customer was a regular enterprise, the quick solution would have been Veeam Backup Copies, as they would read the blocks at source, but then at destination (supposing that obviously another ReFS storage was used) BlockCloning would be leveraged again during the write operation. But this customer is a service provider using Veeam Cloud Connect, so source backups are not mapped into the service provider Veeam installation, and even more there’s no control on the source scheduler because it’s managed by the tenants, thus any activity on those files would result in a file lock, and potentially a failure in customers backups sent over the Internet. We need something different.
Storage-level replication is usually the solution that I would have suggested, and I did also in this case: the underlying storage doesn’t touch the ReFS filesystem at all, and there’s no problem like open files or else. The problem is, you need a storage system with replica capabilities. If you want to save money, you can use some software solution like ZFS for example, but I was curious to see if the new Windows 2016 Storage Replica would have been a good alternative.
Reading the description on the Microsoft documentation, there’s no clear confirmation nor a denial about the support for ReFS blockcloning, but nonetheless, the described behavior is really interesting: the replication happens at the block level, and it can be synchronous or asynchronous. It uses SMB3 as the transport protocol, which means it leverages all the optimizations implemented into this new protocol. We “only” need to find out its behavior with ReFS blockcloning.
NOTE: this solution IS NOT a replacement for Veeam Backup copies and the 3:2:1 rule, because the replica also happens for corrupted blocks and deletions, that are immediately replicated into the secondary storage. This solution is aimed at increasing the resiliency of a Veeam repository in specific use cases.

My Lab setup

In order to test my hypothesis around ReFS replication, I created a small lab. I deployed and configured a new Windows 2016 server with a brand new volume that I formatted using ReFS at 64K block size. (NOTE: you need Datacenter edition to use Storage Replica, as the feature is not available in the Standard edition. Also, both source and target servers must be joined to the same Active Directory domain.) I then added this machine and its E: drive as a repository to my test Veeam server, and setup a new job to create some data on the repository. To speed up things, the job was configured to be executed every 4 hours, to make a synthetic full daily, and to keep 14 restore points; in this way after less than three days the retention was completed and I had this situation on the repository:

Then, while the backups were running and updating the content of my volume, I started to configure Storage Replica (from here on, SR). SR has some specific requirements, that are listed in this page:
  • You must create two volumes on each enclosure: one for data and one for logs.
  • Log and data disks must be initialized as GPT, not MBR.
  • The two data volumes must be of identical size.
  • The two log volumes should be of identical size.
  • All replicated data disks must have the same sector sizes.
  • All log disks must have the same sector sizes.
  • The log volumes should use flash-based storage, such as SSD. Microsoft recommends that the log storage be faster than the data storage. Log volumes must never be used for other workloads.
  • The data disks can use HDD, SSD, or a tiered combination and can use either mirrored or parity spaces or RAID 1 or 10, or RAID 5 or RAID 50.
  • The log volume must be at least 9GB by default and may be larger or smaller based on log requirements.
  • The File Server role is only necessary for Test-SRTopology to operate, as it opens the necessary firewall ports for testing.
There are some “interesting” information coming from this list. The biggest one is that SR requires an additional volume in each server (source and target) for logs, and it has to be a fast one. So, I went back to my two servers and I added two disks with 20GB each. This value is 10% of the size of the data volume, and it’s beyond the 9GB minimum required by SR. I looked around for some sizing recommendations but I didn’t find any information.
As the involved volume need to be of the same size and they should be formatted with the same cluster size, I used ReFS-64KB for all the volumes. My final result was:
refssource – E:\ 200GB, ReFS 64KB
                – F:\ 20GB, ReFS 64KB
refstarget – G:\ 200GB, ReFS 64KB
                – H:\ 20GB, ReFS 64KB
So, the replication will happen from E: to G:, and logs will be shipped from F: to H:.

Storage Replica configuration

After the two servers are properly configured, it’s time to setup SR. Using the user interface, I opened Server Manager and I created a Server Group:
For each of the two servers, I installed the role “File Server” and the feature “Storage Replica”:
Then, we need to test the Replica to guarantee storage and bandwidth are good enough. To do so, in Powershell we run this command:

The test should be executed on a disk with some significant IO, in order to evaluate the capabilities of Storage Replica to replicate that production data to the target system, while the production data are generated. Replica will happen in real time, so it’s important to understand if SR can handle the IOPS generated by the workloads. In my case, I’ve run the test in the middle of a Veeam backup session. The results are available as a web page stored in the temp drive. The first section gives an overview of the test results, and the most important information I got from this part is that everything completed successfully:

Then, if you want, you can see more in-depth information, like for example latency, or transfer speed:
Once the test was completed, I went onto configuring Storage Replica. This is possible via Powershell, or graphically by using the free Server Manager Tool (SMT). I went for Powershell:
Source and destination replica groups (RGName in the script) don’t exist yet, they are created with the script itself, so you can choose their names. If the script is executed correctly, you will see something like this:
The replication is immediately started, and based on the amount of data it has to replicate, the initial syn may take some time. You can check the status with this command:

In my case, there are no more bytes to be replicated, which means the two volumes are now in sync. You can also grab a complete list of information by running this commands in sequence:
The output is like this:
One thing to note is that, by default, the replica is configured in synchronous mode. For the purpose of replicating Veeam backups this is not the ideal configuration, so I went and change this configuration to asynchronous. This could have been done during the replica creation by using the switch -ReplicationMode Asynchronous, but it can also be changed afterward:

ReFS blockcloning and Storage Replica

If you try to open the destination disk to verify its content and be sure that replication is healthy, you may be confused by the output:
This is by design: the destination volume is unmounted as soon as the replica relationship is established, to avoid any damage to the replicated data. So how can we verify the content? In Windows 2016 the only solution is to stop the Storage Replica relationship so that we can bring the G: volume online:

In Windows Server 2016 version 1709 the option to mount the destination storage is now possible – this feature is called “Test Failover”. To do this, you must have an unused, NTFS or ReFS formatted volume that is not currently replicating on the destination. Then you can mount a snapshot of the replicated storage temporarily for testing or backup purposes. I don’t have in my lab this version, if you do you can follow the instructions in this webpage. There are also some other useful information about Storage Replica.
With the Partnership interrupted, we can now see G:\ in a mounted state, and we see its content:
I started the blog with the idea to replicate a ReFS volume that was using blockcloning, so we now want to know if this has been possible or not. As always with BlockCloning, we need to check the volume properties to see the space savings.
The original size of my backups is 69.2 GB, that has been reduced to 27.6 GB thanks to blockcloning. Both source and target volumes have this size, so it seems indeed that Storage Replica is keeping the space savings while replicating! How can I confirm this? Well, here come again blockstat, the tool written by my colleague Tim Dewin. If I run it against the target volume, I get this result:
So, I can definitely confirm that Storage Replica preserves ReFS blockcloning information!!!
With this confirmation, Storage Replica is becoming immediately a great potential solution to replica ReFS volumes, without using storage-level replication, and still preserve ReFS space savings.
0 Flares Twitter 0 Facebook 0 Google+ 0 LinkedIn 0 Email -- 0 Flares ×