Lately, different bugs involving VMware CBT in vSphere 6 have created some justified concerns among users. But there are ways to guarantee successful backups even in these conditions.
Image-level backups are still the way to go
Yes, VMware is having some issues with CBT in vSphere 6.
Since its introduction back in the days of vSphere 4.0, CBT has been the cornerstone to allow fast incremental backups. CBT (change block tracking), as the name says, is a log of changed blocks of a virtual machine that vSphere registers in a file. Different data protection solutions can read this file, list which blocks have been changed since a given timestamp (usually the previous job execution), and thanks to this easily retrieve only those blocks from the storage instead of having to do a full backup every day.
CBT has become so commonly used, however, that we forgot there was a time when it was NOT available. And because we forgot how things were done before, people have started to suggest weird solutions to workaround these CBT issues we are facing lately. I was really dazed when I read this post from Josh Odgers: “VADP or Agent Based Backups”. Josh is really a smart guy and his blog has useful informations, but this time I’m sorry to say that he’s completely wrong… Come on, the only way to avoid the issues with CBT is going back to agent-based backups installed inside virtual machines???
To me, sounds like buying a new car, figure out that it has some issues with the ABS braking assistance, and instead of disabling it or finding a different solution, the suggestion is to go back a century in history and use horses. CBT is indeed the preferred technology to do incremental backups, but is NOT mandatory, exactly like a car can still brake even without an ABS system. In that blog post the entire VADP libraries are called out, and this increases the confusion, as CBT is just a part of the VADP libraries, and VADP do many more things than just CBT. We can use VADP and keep doing efficient image-level backups, and at the same time avoid CBT completely.
Obviously, I’m writing this post while I work at Veeam, so some could say that this post I’m writing is totally biased. So, let’s do like this: if you are using another solution, use this post simply as a suggestion to verify if the software you are using can disable CBT and still do image-based backups.
CBT or No-CBT?
To show you how you can avoid at all CBT and still do image-level backups, I’ve created two backup jobs in Veeam. They are protecting the same virtual machine, so I will be able to compare the results. The virtual machine for the test runs Windows 2012 R2, and I’ve left it unused for several months. This is good because between the tests with full backups and those with incrementals, I will have the chance to run Windows Update, and this will surely create a lot of changed blocks that will be tracked by CBT.
Let’s see first the full backups. If CBT is used, the results are like this:
You can see the [CBT] sign in the line where the software is processing the virtual disk, and this means that Veeam is reading the CBT informations. They are used also in a full backup to identify and remove zeroed blocks for example. Then, I’ve run the same job, this time with CBT disabled:
As you can see, the time needed to complete the backup is basically the same, and it could be expected since during a full backup all the blocks of the virtual machine are retrieved. On a bigger machine you could see a difference in the time needed to complete the activity, because as said before CBT is used also in full backups to identify blocks that were never written and skip them.
After the full backup, I’ve run Widows Update so to generate many changed blocks:
After a required reboot, it was time to run the two jobs, this time with an incremental pass. First, the one using CBT:
The virtual disk is 40 GB, but thanks to CBT Veeam knows already which are the changed blocks, and processes only those. The changed blocks consume 5.0 GB, and the incremental backup lasted 13:02, compared to the full backup that took 29:19. The virtual disk itself was processed in 8:59 instead of 23:30 during a full backup.
Then, I run the incremental backup without CBT:
Compared to the incremental with CBT, this last one clearly show the benefit of CBT: Veeam had to read 12.1 GB of data from the VM instead of 5.0 GB, to obtain the same 2.1 GB of changed data. And the operation in the virtual disk took 12:57 minutes instead of 8:59.
Both jobs were successful. Even without CBT, is completely possible to run image-level backups and even have incremental backups. Veeam scans the virtual disks to identify changed blocks instead of relying on informations coming from CBT. This scan operation is obviously slower and takes more time to be completed, and creates more IO on the storage, but it can be a safe option for people worried about the CBT bugs.
In addition, switching from CBT to disk scan, and back, is really simple, as it’s just a flag in the advanced option of a backup job:
Once VMware will release the patches for the bugs, it will take you just a quick edit of the job to re-enable CBT, and return to a more efficient configuration. No other changes in the job will be needed, and Veeam will immediately start using CBT informations in the following run.
If you decide to disable CBT, please take into account the additional time it will take to run incremental backups. There are no rule of thumb of how much additional time you will need, as it totally depends on the amount of changed blocks that Veeam has to identify by reading informations directly from the storage. But probably non-CBT jobs will still be faster than agent-based backups…
UPDATE 2015-11-25: VMware just released a patch to fix the CBT but for vSphere 6, you can see it at KB2137546. But be careful, in addition to applying the patch, you also need to reset the CBT map as it’s probably corrupted by the bug. See Veeam KB2075