Veeam Backup & Replication and vSAN integration deep dive

Last 15^thNovember I attended and presented at the 4^th Italian VMUG UserCon in Milano. First, let me say thanks to all the people I met: some stopped by to talk, to ask questions, to discuss “anything IT”, some just to say hi. Thanks to all of you, it has been a great day, and even our technical session was a full room one. The main topic was how to leverage vSphere and Veeam capabilities to improve the overall data protection performance of your VMware environments. One of the parts of the presentation was about VMware vSAN, and how Veeam integrates with it in a specific and (yes, I’m biased) nice way.

Veeam introduced specific support for vSAN back in mid-2014 as part of Veeam Backup & Replication v7.0 Update 4. More than support I should say integration, and let me explain why.

To claim “vSAN support” is honestly nothing special. Whatever is the storage technology in use, VMware VADP libraries allow any backup solution leveraging them to extract data from an ESXi and read the virtual disk of any virtual machine. It’s just part of the VMware libraries, so claiming this as an advanced feature is not really anything one should be proud of.

“vSAN integration” is instead what Veeam has in Backup & Replication; let me explain you how it works using a practical example.

vSAN-aware backups, step by step

Let’s take my lab as an example. I have a 4–nodes vSAN cluster, and around 60 VMs running over it. vSAN 6.2 is the only storage I’m using, and it’s built with both SSDs and HDDs. But this is not so important for what I’m going to show you.

I have a virtual machine, DC2, a domain controller built with Windows 2012 R2, installed on a single 40GB virtual disk. The virtual machine is running on ESXi-4, so I can design the actual state like this:

For the purpose of my backups, I’m not really interested where the machine is executed. I’m more interested in the placement of the vSAN disk components. The virtual machine is using a vSAN policy that has 1 stripe per disk, and 1 Failure To Tolerate. The practical effect is that the virtual disk is protected inside vSAN like a Raid-1, with two copies of the disk, plus a witness component. I can see the placement of these components inside vSAN by looking at the properties of the virtual machine itself:

If I want to graph these information in my schema, this is how the physical disk placement will appear:

vSAN has no data locality. It can happen like in my example that the virtual machine is executed on the same node where a copy of the disk is stored, but this is not guaranteed and it’s not pursued by the vSAN algorithm.

Now, when Veeam is instructed to take a backup of this virtual machine, it could actually connect to the node executing the VM, and request via the network for a copy of the disk. But if the disk was not locally stored in the node, that would create additional network traffic as the node would have first to retrieve data from another node, and then pass these data to the Veeam proxy. Veeam integration instead tries as much as possible to reduce network traffic to the minimum, by always trying to retrieve data from a node that has a local copy of the disks. To do so, the ideal design that we suggest is to have Veeam proxies deployed as virtual machines on the vSAN cluster itself. Ideally, you would want to have one proxy per node, distributed by using dedicated DRS rules like these ones:

Each proxy is forced to run on a specific host, so that at any time I have one proxy running on each ESXi server.

I then create a new backup job in Veeam to protect this virtual machine. Note that there is no specific setting to create a “vSAN backup”, the vSAN storage is automatically identified by Veeam, and all the special procedure is then started automatically. First, Veeam enumerates the different VSAN objects:

Container 'Hierarchy object "dc2". Host: "vcsa.cloudconnect.local". Reference: "vm-42". Type: "VirtualMachine". Name: "VirtualMachine".' depth 1000

[Soap] Logging on to "vcsa.cloudconnect.local", port 443, user "cloudconnect@vsphere.local", proxy srv: port:0, serviceType: public, timeout: 200000 ms

[Vsan] Starting nodes analysis. Computing data amounts direct accessible on different hosts
[Vsan] Computing direct accessible data amounts for disk 'dc2.vmdk'
[Soap] QueryVsanObjects, objects uuids '9cab0857-c8f1-4ba9-d6f8-002590c0162a'
[VimApi] QueryVsanObjects, Ref: 'ha-vsan-internal-system-20'
[Vsan] Disk 'dc2.vmdk' Total:
	[Node '569eb942-5112-55e8-edc9-0025909134c8':42953867264],
	[Node '569fddae-528f-6520-c933-002590c0162a':4194304],
	[Node '569fe96b-1a31-aa18-ffed-002590c010f8':42953867264]
[Vsan] Finished nodes analysis. Total data amounts for all disks:
	[Node '569eb942-5112-55e8-edc9-0025909134c8':42953867264],
	[Node '569fddae-528f-6520-c933-002590c0162a':4194304],
	[Node '569fe96b-1a31-aa18-ffed-002590c010f8':42953867264]

Two nodes (identified by their UUID) have 42953867264 bytes of data, that is 40Gb, the size of the virtual disk. Another has 4194304 bytes, just 4MB; this is the witness. Once it knows were the disk components are, the software tries to map these data to the proxies that can access them:

[ProxyDetector] Detecting storage access level for proxy [px2.cloudconnect.local]
[ProxyDetector] Found proxy is on suitable ESX: 'vm-747'. All disk can be processed through hotadd
[Vsan] Node uuid for proxy vm 'px2' (phys host 'esx2.cloudconnect.local'): '5698370c-8410-0a5b-4b7f-0025909b6a04'
[VsanProxyDetector] Proxy 'px2.cloudconnect.local' has direct access to 0 bytes and obtains HotAddDifferentHosts mode

[ProxyDetector] Detecting storage access level for proxy [px4.cloudconnect.local]
[ProxyDetector] Found proxy is on suitable ESX: 'vm-749'. All disk can be processed through hotadd
[Vsan] Node uuid for proxy vm 'px4' (phys host 'esx4.cloudconnect.local'): '569fe96b-1a31-aa18-ffed-002590c010f8'
[VsanProxyDetector] Proxy 'px4.cloudconnect.local' has direct access to 42953867264 bytes and obtains HotAddSameHost mode

[ProxyDetector] Detecting storage access level for proxy [px1.cloudconnect.local]
[ProxyDetector] Found proxy is on suitable ESX: 'vm-746'. All disk can be processed through hotadd
[VsanProxyDetector] Proxy 'px1.cloudconnect.local' has direct access to 42953867264 bytes and obtains HotAddSameHost mode

[ProxyDetector] Detecting storage access level for proxy [px3.cloudconnect.local]
[ProxyDetector] Found proxy is on suitable ESX: 'vm-748'. All disk can be processed through hotadd
[VsanProxyDetector] Proxy 'px3.cloudconnect.local' has direct access to 4194304 bytes and obtains HotAddSameHost mode

Some proxies are marked as “hotAddSameHost”, which means this proxies can do a hotadd backup by mounting a local copy of the virtual disk, that is a disk stored in the same ESXi host where the proxy is running. If we add these information again to our schema, this is the result:

PX1 and PX4 have local access to one of the two copies of the virtual disk. PX3 is technically marked as “same” proxy too, but the resource that it can access locally is only limited to the witness. The final step is to select which proxy will execute the read from vSAN:

VM [Name: dc2, Ref: vm-42] is running.
VM IP addresses: fe80::286f:5e60:f915:104d, 10.10.51.22

- Request: ViDisk_|ViProxyRepositoryPairResourceRequest, ProxyResourceRequest: [ViProxy, source proxies:[Vi proxy resource [id=7cb7a5d0-26f1-4534-8aa1-2a2ac8922188 : srv name=px2.cloudconnect.local : access level=HotAddDifferentHosts : max usage=2 : vddk modes=hotadd;nbd]],[Vi proxy resource [id=6563db50-f5b7-4c0d-bf69-834500a6bb3b : srv name=px4.cloudconnect.local : access level=HotAddSameHost : max usage=2 : vddk modes=hotadd;nbd]],[Vi proxy resource [id=2703c214-a476-4f1a-a7d1-9b6364716ed3 : srv name=px1.cloudconnect.local : access level=HotAddSameHost : max usage=2 : vddk modes=hotadd;nbd]],[Vi proxy resource [id=4a78c7f8-a148-4362-bf3f-ee98323b7c17 : srv name=px3.cloudconnect.local : access level=HotAddSameHost : max usage=2 : vddk modes=hotadd;nbd]] ],  

- - Response: Count: 1, details: [Subresponses: [Responces: [Vi proxy resource [id=6563db50-f5b7-4c0d-bf69-834500a6bb3b : srv name=px4.cloudconnect.local : access level=HotAddSameHost : max usage=2 : vddk modes=hotadd;nbd]],[Repository : resource allocated]

- - Request: ViSnapshot, host: esx4.cloudconnect.local, datastores: vsanDatastore
- - Request: Prepare Vi VMs, Vm 'dc2' on Host: 'esx4.cloudconnect.local'
- - - - Response: VMs allocated for processing
Set status 'InProgress' for task session '075169e9-81e3-4428-a56d-6a8c82106f94', object name 'dc2'

As you can read in the log, PX4 has been chosen to run the backup, which means that data is going to be read directly from the local storage of ESXi-04 by PX4, without any network traffic. Like this:

This is pretty cool if you ask me.

In the next post, I’ll show you how SPBM (Storage Policy Based Management) policies are managed during backups and restores, as this is another important piece of supporting VMware vSAN (and VVOLs too).

vSAN-aware backups, step by step

Share this: