In a previous blog post (Build a Microsoft Storage Spaces Direct cluster using VMware virtual machines) I’ve written about how to create and configure a complete cluster using Microsoft Storage Spaces Direct (abbreviated S2D usually, and this will be the name I”ll use from now on) using multiple VMware vSphere virtual machines.
Since that post, many things have happened, I got involved into many discussions around this solution, its use cases, the level of confidence we should put into a new file system, its price and so on. And as Microsoft Windows 2016 is now finally generally available, people are starting to seriously looking at its features, and no doubt S2D together with the new ReFS 3.1 is one of the hot topics. I’ve first of all updated my lab with the final version of Windows 2016 in order to have my cluster in a “stable” state, than I started to focus on the different topics related to Windows 2016 and its usage as a Veeam repository.
Thre first point I want to talk about is the choice of the filesystem. In my opinion, this has become a no brainer now that ReFS 3.1 is available and its features can be leveraged by Veeam Backup & Replication 9.5 (coming out soon). There has been around several discussions about ReFS 3.1, its BlockClone API and how Veeam is going to leverage it, but let me give you a short overview for those who missed this information.
Microsoft has introduced a new feature in ReFS 3.1 called BlockClone, that can be leveraged via API calls. Thanks to this feature, ReFS can clone blocks by just updating its metadata information, without effectively copying the same block multiple times, and only updating the reference count of the same block. Say I have two files, made with multiple blocks (all images are taken from the official Microsoft MSDN article, read here to learn more):
Per Microsoft own text: “Now suppose an application issues a block clone operation from File X, over file regions A and B, to File Y at the offset where E currently is.”. This sounds a lot like a Veeam Backup transform operation where an incremental backup is merged into the full backup, isn’t it? The result on the file system after the clone operation is this:
You can immediately understand the huge advantage of this solution. The new file Y is not effectively written again, it’s just an update operation on the Reference Count of the file table in ReFS, that block regions A and B are used two times in the file systems, by both file Y and Y. The net result is that transform operations in Veeam Backup & Replication are insanely faster, as only metadata need to be updated. Also, GFS retention where a complete full backup is written again now doesn’t consume additional space on the disk, as the same block is just referenced multiple times.
NOTE: there is NO space saving on incremental backups during transform operations, as the same block is always written once, and it’s only moved from the incremental backup file to the full one. Transform operations are about time saving, not space saving. You can get savings when you run a synthetic full, either in backups or backup copy jobs (like GFS retention).
How can you leverage this technology? You will need first of all to upgrade to Veeam Backup & Replication 9.5, and have at least one repsitory using Windows 2016 as the underlying operating system, and a volume formatted with ReFS 3.1. Veeam datamover will immedately recognize the ReFS filesystem and will immediately leverage BlockCloning. You may recognize the result first of all by simply looking at the time for completing a merge, and by looking at the line in the Job Statistics like this:
It took only 22 seconds to merge three virtual machines into a Forever Forward Incremental job, and you see the [fast clone] note in the log. That’s the sign that ReFS BlockCloning has been used.
2. Storage Spaces and Integrity Streams
One of the design goals when planning for a backup repository is always the balance between performance and cost. In small environments, with lower RTO requirements, cost has always been the main driving factor, and this has lead IT admins to use raid solutions like Raid5 and Raid6 over large spinning disks, to keep cost per GB as low as possible. On the other side, this design has always lead to bad performance.
First, let me remove once and for all Raid5 as a option. It may have been the preferred choice in the past, but simply these days disks are too large for Raid5, and their rebuild times are too long to justify a raid solution where only one disk is in charge of parity information. If you are using 4TB disks for example, a rebuild operation of a failed drive takes way more than a day. This means that for more than 24 hours, your storage is totally unprotected from any additional failure, and because the disks are actively working to rebuild the failed one, and they are probably going to receive another daily backup at some point, the chances to break another disk are even higher. Too high to justify the risk.
I’ve seen many moving to Raid6. This solution has double parity, which can protect from single failures. But the side effect is that write penalty is even worse, and so performance are lower than Raid5 (on e general level obviously, different storage systems may have different performance).
Usually, we suggest at least for primary backup targets to use Raid10 (or Raid50/60) if possible, so that there’s no parity calculation involved while writing data, thus backup operations are faster. Obviously, the penalty is now the consumed disk space, because the overhead for any written block is 50%, that is data consumes double the space of the original file. Many don’t like to consume storage with Raid10, but again the new ReFS technologies can help to remove this problem. The first reason is again BlockCloning: even if I’m now using Raid10 instead of Raid6, ReFS will write my merged backup files only once, so there will be again a large saving on space. If you compare NTFS over Raid6 and ReFS+BlockCloning over Raid10, chances are the latter will give better results in terms of space consumption.
But Windows 2016 has another solution that should be considered at this point: Storage Spaces. Storage Spaces is a software storage solution, directly built into the operating sytem. Instead of multiple disks all managed by a hardware raid controller installed in the server, thanks to Storage Spaces now users can build a redundant storage solution using simple disks all connected to the servers, and leverage the technologies available directly from the OS. Specifically, a first tier of fast storage media can be used as read and write cache (SSD, NVMe, or else in the future), while a bunch of spinning disks can be used as the capacity tier. On top of this, you can configure the needed resiliency, that comes in two flavors: Mirror and Parity. While Parity is comparable to Raid5 and Raid6, so even in regards to write penalty, Mirror is as the name says a solution where same data is written multiple times in two (or three) different disks.
The final result is that admins can design a software storage that works in the same way as a hardware solution. With one nice addition that cannot be used with the latter: Integrity Streams with self-healing. Again, per Microsoft own information: “When ReFS is used in conjunction with a mirror space or a parity space, detected corruption — both metadata and user data, when integrity streams are enabled — can be automatically repaired using the alternate copy provided by Storage Spaces.” Integrity streams is technically available in any ReFS volume, but with only one volume the data integrity capabilities can only identify and warn about block corruption. But if we use the Mirror resiliency and we enable Integrity Streams, any corruption of both metadata and data can be repaired by simply copying back the additional copy of the mirror. This cannot be done with just one disk, and it’s the reason why I’m suggesting to leverage Storage Spaces for resiliency instead of hardware raid solution: the latter expose the multiple disks they have like one single volume, thus blocking the possibility to use the Self Heaing capabilities of Integrity Streams.
To me, Integrity Streams and Self Healing is as much as important as BlockCloning when it comes to use Windows 2016 and ReFS to build a Veeam backup repository. And our developers believe the same, in fact when Veeam identifies that a backup repository is using any ReFS volume, it automatically enable Integrity Streams. You can check the status of an entire volume or even a single file by using some new powershell commands: Get-FileIntegrity and Set-FileIntegrity.
Finally, all these integrity capabilities are proactive. Error Correction techniques are leveraged by a data integrity scanner, which is also known as a scrubber. The integrity scanner periodically scans the volume, identifying latent corruptions and proactively triggering a repair of that corrupt data. And since all these operations are executed online, there is not even a chkdsk command on ReFS.
3. Storage Spaces Direct
So far, in the previous two paragraphs we have built a new storage system for Veeam backups, using Windows 2016, ReFS file system and Storage Spaces. Overall, this is already a huge improvement in the repository design, but can we do even more? Is there any area where the design can be improved? For sure, the main missing topic is scalability and failure domains.
When it comes to scaling the repositories and plan for failures of single nodes, not even Storage Spaces is enough, as it’s a technology that has a clear boundary in the single server where it’s running. We can expand the single server by adding additional disks and shelves, thus fulfilling the scalability issues, but at the same time we are also increasing the failure domain. If the server fails, all the hosted backup files are not usable. The potential fix for this problem is Storage Spaces Direct.
I wrote already a post in these regards, and I’ve explained how to Build a Microsoft Storage Spaces Direct cluster using VMware virtual machines. Microsoft has published a cool video on Youtube to explain how S2D (that’s the acronym for Storage Spaces Direct) works. Take a look before continuing with this blog post:
The possibility to scale-out the same design of Storage Spaces, and have a redundant cluster running again ReFS sounds intriguing, and it is. You can have a S2D cluster running as a backup repositories, and you can even lose an entire node in the middle of a backup or a restore operation, and the process will continue nonetheless.
There are two “limits” you should be aware of: the first one is that each volume is owned and operated by only one node at a time. So, if you have a cluster with four nodes but only one volume, only one of the server will read and write data to that volume, even if the volume itself is spanned over all the nodes. This is in my view an acceptable limit, as in many other cluster solutions the idea of having a “all active” design is not so common. Also, there is a workaround to leverage the compute performance of the entire cluster, and it’s the creation of at least the same amount of volumes as the number of nodes. In this way, S2D will automatically balance the ownership of the volumes across the running nodes, and you may end up for example with 4 volumes over 4 nodes, each owned and operated by one of the nodes. In Veeam, you will only need to register every volume as a backup repository in order to use them all.
The second limit is licensing. While Storage Spaces is available in both Windows 2016 Standard and Datacenter editions, S2D is only available in the Datacenter edition. There is a price difference between the two editions, and how licensing works. Short story, Windows 2016 is licensed per core, but the license price you see around is for 16 cores (see the official Microsoft page here). Say I want to build a 4 nodes S2D cluster with the same server model. Suppose that each machine is running a 2 sockets with 8 core for each processor, which is a common configuration these days, the total amount of cores in the cluster is 64, which means 4 licenses. At a street price of 6155 USD, this means that on top of the price for the hardware, I need 24.620 USD for the licensing. If you compare it to the standard licensing, that is just 882 USD per license, the price to license the 4 servers is 3.528 USD.
This is a big difference in price, and depending on your needs it may or may not be a problem to justify it. For some it may be a hard stop against any idea to use S2D as a Veeam backup repository, for others it may not. At the end it could not be a problem, as the price for a distributed cluster with 4 nodes and a lot of storage can be easily 10.000 USD per node. Add the licensing to it, and the complete solution can be around 65.000 USD. That’s the price for many dedicated backup appliances, or even more, that do not have maybe even the redundancy guaranteed by the scale-out design. Do your math first, S2D may be your next backup solution.