In VMware environments image-based backups are the most common nowadays; they save directly VMDK blocks instead of having agents installed inside the Guest OS to save single files. Using CBT technology, first introduced with vSphere 4.0, we can now identify only those blocks that changed since the previous backup, and save only those. This allows incremental backups using low disk space and short backup times.
CBT technology and its interaction with Guest OS filesystem however has some issues regarding the optimization for backup purposes.
Let’s see an example: we have a Windows 2008 R2 VM acting as a file server. Its VMDK disk has been formatted with NTFS filesystem using default block size at 4KB, and hosts thousands of files. A user modifies a small 12kb file and the changes need to be written to disk. Now, that file is made of 3 NTFS cluster, and because of fragmentation, those 3 blocks could be spreaded in 3 different VMFS blocks:
In this case a simple change of 12KB in a file can result in a 3MB change on VMFS virtual disk. A classic backup software with a dedicated agent would have detected the single file change, while a image-based backup would save all the modified 3 MB data.
These issues are “embedded” in the CBT way of working (but it also offers on the other hand a great set of pros…), but it is possible in some ways to optimize this behaviour to reduce the modification of CBT blocks.
You can basically do two things:
– defrag of the guest partitions. This activity sorts out all the clusters of the guest filesystem optimizing the fill of CBT blocks, resulting in the usage of the smaller possible amount of VMFS block. Since many blocks will be modified by defrag and they will be marked as modified by CBT, is a good task to do only before a full backup; doing it before an incremental will result in a backup size near to that of a full one.
– sdelete: this tool deletes non-used clusters filling them with 0. Besides being a great way of secure file deletion, is good also to clean VMFS blocks and optimize disk space. It’s anyway NOT to be done on Thin Provisioned disk, since sdelete would fill all the assigned space inflating the disk to become al large as the assigned space, like a think disk.
Scheduling a script that can do those two tasks before a full backup can lead to great improvements in backup disk usage and speed.
Hi Luca, looks like you are mixing up things. Basically Block in Changed Block Tracking do not refer to VMFS Block definition…
Duncan Epping (VMware) and Mike Zolla (EMC) have written about CBT, chek it out at:
http://www.yellow-bricks.com/2009/12/21/changed-block-tracking/
http://thebackupwindow.emc.com/mike_zolla/changed-block-tracking-and-you/
Cheers,
Didier
Got your point, but there is no mixing between the two.
Let me try to explain in another way: since blocks in VMFS are 1Mb in size, a simple change in a 4k block in the upper NTFS filesystem means a whole VMFS block is marked as changed, thus increasing the size of the VMFS blocks you need to save.
With a file-baseb backup, you see the 4kb change and you only save it, with an image-based backup you have to save the whole 1MB VMFS block.
Sounds better in this way?
Sorry, no. Didier is right: the block size used for CBT is not related to the VMFS block size. CBT blocks are at least 64K, but will grow with the size of the VMDK file. This is still much more than the guest NTFS default block size of 4K, so the issue that you are describing still exists in a way.
However, I wonder if it’s really worth the effort to defragment your guest’s disks before each full backup. Have you conduct tests that clearly show any effects?
Regards
Andreas
Are you pointing to VMFS-3 or VMFS-5? The VMFS block size is different than the NTFS block size so wonder why this confusion here?
It’s the results of some tests we did at a customers, using Veeam Backup 6. After “cleaning” 25 VMs with this settings, a full backup (so not influenced by incremental differences during different days) we saw 6-8 hours of time reduction and 150-200 Gb less out of 1.2 Tb of total vmdk size.
Re-reading my blog post, I probably had better results avoiding CBT terms and talk only about VMDK blocks, it’s somewhat confusing and Didier was right… Need to re-edit it…