I’ve always been a fan of scale-out storage architecture, I’ve always said that The future of storage is Scale Out, and I’ve spent a fair amount of time studying software-only solutions like Ceph. The new solution from Microsoft, Storage Spaces Direct, seems like another great technology that will be soon available to us, so I decided to test it in my lab.
Storage Spaces Direct
Storage Spaces Direct is a new share-nothing scale-out storage solution developed by Microsoft, that will be soon available as part of Windows Server 2016. If you want to learn more about it, this one is a really good starting page. To test this solution using Windows Server 2016 Technology Preview 5, I’ve decided to run the entire solution nested in my VMware lab. There are some configurations and steps that need to be taken to make it run inside virtual machines. The final result is going to be a 4 nodes cluster, as this is the minimum amount of nodes that is required.
Storage Spaces Direct can be shortened as S2D, and it’s the name you will see in this article.
The virtual machines
For my lab, I’ve built 4 different virtual machines, with these hardware specifications:
4 vCPU
16 GB RAM
System disk 40 GB
Two network connections
In addition to this, I’ve added 4 hard disks. They are all connected to a dedicated SCSI controller, that is configured with Physical SCSI Bus Sharing:
This is paramount to guarantee the correct identification of the disks by the Storage Spaces wizards.
The 4 disks connected to this controller has then to be created as Thick Provision Eager Zeroed, otherwise the Physical SCSI Bus Sharing cannot be used. So, be careful about the storage consumption as these disks are going to be completely inflated from the beginning. Then, I needed to configure the 30GB disk as an SSD. On a virtual machine, this can be done by adding:
scsi1:0.virtualSSD = 1
to the VMX configuration file. With this parameters, the Guest OS can recognize the disk as an SSD, and use it later as the caching tier for Storage Spaces Direct. Windows 2016 also properly identifies the other disks. Without the Physical Bus sharing in fact, there will be for each disk this error during the cluster validation:
In fact, without setting this option, during the validation of a new node in the cluster, you will see this error in the Storage Spaces Direct configuration:
Failed to get SCSI page 83h VPD descriptors for physical disk XX
By using the proper bus sharing, the disks are correctly identified:
This can also be appreciated using Powershell. With regular disks and no BUS Sharing, this is the output:
Get-PhysicalDisk | Select FriendlyName, SerialNumber, CanPool, Size, MediaType | ft FriendlyName SerialNumber CanPool Size MediaType ------------ ------------ ------- ---- --------- VMware Virtual disk False 42949672960 UnSpecified VMware Virtual disk True 32212254720 SSD VMware Virtual disk True 107374182400 UnSpecified VMware Virtual disk True 107374182400 UnSpecified VMware Virtual disk True 107374182400 UnSpecified
With thick disks and Physical Bus Sharing, this is the output:
FriendlyName SerialNumber CanPool Size MediaType ------------ ------------ ------- ---- --------- VMware Virtual disk False 42949672960 UnSpecified VMware Virtual disk 6000c299957d8cde367161928eda904a True 107374182400 UnSpecified VMware Virtual disk 6000c295657d3d1b9df1f379de1d3c23 True 107374182400 UnSpecified VMware Virtual disk 6000c29a9c9463e575933bd0d9557af4 True 107374182400 UnSpecified VMware Virtual disk 6000c29ff36ade0acd10cd69ab890436 True 32212254720 SSD
Windows 2016 TP5 is installed on all the four nodes, and they are all joined to my domain. There are two networks on each node, and the final configuration is like this:
ssd1.cloudconnect.local 10.10.51.151 10.10.110.151 ssd2.cloudconnect.local 10.10.51.152 10.10.110.152 ssd3.cloudconnect.local 10.10.51.153 10.10.110.153 ssd4.cloudconnect.local 10.10.51.154 10.10.110.154
We are now ready to build the cluster.
Build the cluster using powershell
In order to speed up things, and get a consistent result, I’ve decided to build the new cluster using Powershell. Also, you will see later there are some steps that need specific options that may not be available via the graphical interface.
First, on each of the four nodes we install the needed components:
Install-WindowsFeature –Name File-Services, Failover-Clustering –IncludeManagementTools
Then, we go and we create the new cluster:
New-Cluster –Name S2D –Node ssd1.cloudconnect.local, ssd2.cloudconnect.local, ssd3.cloudconnect.local, ssd4.cloudconnect.local –NoStorage –StaticAddress 10.10.51.160
If we then proceed and validate the cluster, either via the graphical Failover Cluster Manager or again with Powershell using:
Test-Cluster –Node ssd1.cloudconnect.local, ssd2.cloudconnect.local, ssd3.cloudconnect.local, ssd4.cloudconnect.local –Include “Storage Spaces Direct”, Inventory, Network, ”System Configuration”
we will notice that the section “Storage Spaces Direct” has a result of Failed. The reason is this one:
But as I said, there is a workaround.
Next step, we check the Cluster network, and we configure the two available networks to be available for clients (frontend) and for cluster internal communications (backend):
The tests added in TP5 run as we said some SCSI commands, that fail on a virtual disk. We can work around this by turning off automatic configuration and skip eligibility checks when enabling S2D, and then manually create the storage pool and storage tiers afterwards:
Enable-ClusterS2D -CacheMode Disabled -AutoConfig:0 -SkipEligibilityChecks
Then, we create a new storage pool:
New-StoragePool -StorageSubSystemFriendlyName *Cluster* -FriendlyName S2D -ProvisioningTypeDefault Fixed -PhysicalDisk (Get-PhysicalDisk | ? CanPool -eq $true)
and we configure all the virtual disks by making them appear as proper HDD, as Storage Spaces Direct as seen in the error above, accepts only SSD and HDD. Before we have this situation:
Get-PhysicalDisk | Select FriendlyName, SerialNumber, CanPool, Size, MediaType | ft FriendlyName SerialNumber CanPool Size MediaType ------------ ------------ ------- ---- --------- VMware Virtual disk 6000c29b9bd87ab778bb5665d14f403f False 32212254720 SSD VMware Virtual disk 6000c299957d8cde367161928eda904a False 107374182400 UnSpecified VMware Virtual disk 6000c29cca50afa7099a008a7f2e9c17 False 107374182400 UnSpecified VMware Virtual disk 6000c29942dce272f1093a5bfe3623d6 False 107374182400 UnSpecified VMware Virtual disk 6000c2913dfe2fca0f6388a2639ad786 False 32212254720 SSD VMware Virtual disk 6000c29280a6c448188267920bc71f3d False 107374182400 UnSpecified VMware Virtual disk 6000c298e2615769f51118997f4db519 False 107374182400 UnSpecified VMware Virtual disk 6000c295657d3d1b9df1f379de1d3c23 False 107374182400 UnSpecified VMware Virtual disk 6000c2998dd1aedb7a5584511de51951 False 107374182400 UnSpecified VMware Virtual disk 6000c2965ff38417d428ec4f44904ba2 False 107374182400 UnSpecified VMware Virtual disk 6000c29bbd8c76527b89db6aafbed3da False 107374182400 UnSpecified VMware Virtual disk 6000c299d0325700aafb81688c5a8f97 False 107374182400 UnSpecified VMware Virtual disk 6000c29a9c9463e575933bd0d9557af4 False 107374182400 UnSpecified VMware Virtual disk 6000c290e17bb64fd303dff7e5a4cb92 False 32212254720 SSD VMware Virtual disk 6000c29ff36ade0acd10cd69ab890436 False 32212254720 SSD VMware Virtual disk 6000c29132401413d519d4409531d423 False 107374182400 UnSpecified
With this command we force these disks to be marked as HDD:
Get-StorageSubsystem *cluster* | Get-PhysicalDisk | Where MediaType -eq "UnSpecified" | Set-PhysicalDisk -MediaType HDD
If we check again the available disks, we have this new situation:
FriendlyName SerialNumber CanPool Size MediaType ------------ ------------ ------- ---- --------- VMware Virtual disk 6000c29b9bd87ab778bb5665d14f403f False 32212254720 SSD VMware Virtual disk 6000c299957d8cde367161928eda904a False 107374182400 HDD VMware Virtual disk 6000c29cca50afa7099a008a7f2e9c17 False 107374182400 HDD VMware Virtual disk 6000c29942dce272f1093a5bfe3623d6 False 107374182400 HDD VMware Virtual disk 6000c2913dfe2fca0f6388a2639ad786 False 32212254720 SSD VMware Virtual disk 6000c29280a6c448188267920bc71f3d False 107374182400 HDD VMware Virtual disk 6000c298e2615769f51118997f4db519 False 107374182400 HDD VMware Virtual disk 6000c295657d3d1b9df1f379de1d3c23 False 107374182400 HDD VMware Virtual disk 6000c2998dd1aedb7a5584511de51951 False 107374182400 HDD VMware Virtual disk 6000c2965ff38417d428ec4f44904ba2 False 107374182400 HDD VMware Virtual disk 6000c29bbd8c76527b89db6aafbed3da False 107374182400 HDD VMware Virtual disk 6000c299d0325700aafb81688c5a8f97 False 107374182400 HDD VMware Virtual disk 6000c29a9c9463e575933bd0d9557af4 False 107374182400 HDD VMware Virtual disk 6000c290e17bb64fd303dff7e5a4cb92 False 32212254720 SSD VMware Virtual disk 6000c29ff36ade0acd10cd69ab890436 False 32212254720 SSD VMware Virtual disk 6000c29132401413d519d4409531d423 False 107374182400 HDD
The final result is the pool correctly created and ready to be consumed:
Virtual disks and volumes
Now that the cluster is created, it’s time to create our first volume and use it. For this part of the post, I’ll go back to the graphical interface, so I can explain a little bit the different available options. To start, with the pool selected, we start the wizard to create a New Virtual Disk. After selecting S2D as the storage pool to be used, we give the virtual disk a name and select to use tiers:
We accept enclosure awareness, and we configure the storage layout as this: mirror for Faster Tier and parity for Standard Tier, and for the resiliency settings two-way mirror for Faster Tier and Single Parity for Standard Tier.
Then, we configure Faster Tier size at 50GB, Standard Tier size at 500 GB, and we disable the Read Cache. We confirm all the selections and the disk is created. Before closing the wizard, we select the option to immediately create a volume: file system will be ReFS and it will use the entire size of the virtual disk:
Last step of this part, we select the virtual disk and use the command “Add to Cluster Shared Volumes”.
File server and shares
Now, as we want to have at the end of the test a working file share where we can drop our files, we need to create a role in the cluster, in this case a File Server. A simple powershell one-liner is all we need:
New-StorageFileServer -StorageSubSystemName S2D.cloudconnect.local -FriendlyName SOFS -HostName SOFS -Protocols SMB
Then, we create the share. In the nodes of the cluster, there is a mount point for the newly created volume, in C:\ClusterStorage\Volume1. We will use this location to create our new share:
md C:\ClusterStorage\Volume1\Repository New-SmbShare -Name Repository -Path C:\ClusterStorage\Volume1\Repository -FullAccess ssd1$, ssd2$, ssd3$, ssd4$, “Cloudconnect\Domain Admins”
The share can be reached now over the network using the UNC path \\SOFS\Repository, and we can read and write data to it.
To test the resiliency of S2D, I’ve done this simple test. I started to copy some large ISO files to the share, and while the copy was going on I powered off directly from vSphere the node ssd3, at the time owner of the File Server role. The role is immediately passed to ssd2, and the file copy goes on without any interruption.
Final notes
I’ve really enjoyed the time I’ve spent to play with Storage Spaces Direct. The issues to make it work in a virtualized environment are not important, as in a production environment I’m expecting people to use physical servers among those listed in the hardware compatibility list that Microsoft is preparing. The configuration of the solution is really simple, and the failover capabilities are really reliable. When Windows 2016 will become Generally Available later this year, I’m expecting many IT admins to start thinking of it as a new possible solution to create a scale-out storage, especially in situations where SMB3 is the needed protocol.