How can you test 1000 VMs with Veeam SureBackup?

0 Flares Twitter 0 Facebook 0 LinkedIn 0 Email -- 0 Flares ×

You can, but maybe you do not want to test them all at the same time. But don’t worry, there’s a way to do completely automated tests even for huge environments.

Surebackup

UPDATE 11-03-2016: one of our customers, Hans Leysen, took this script as the starting point and improved it by adding an hash table, saved in an xml file, to check only VMs that are yet tested. So, no more randomness, but a precise procedure to test them all over the days. The great thing is that this script is running in his production environment. Thanks Hans!

SureBackup in Veeam is a great solution. With it, you can take any backup and then run automated tests against each protected VM. By powering on the VM in a virtual lab, isolated from the original VM in production, you can first of all be sure that the virtual machine can be powered on when needed from a backup, but you can also configure SureBackup to tests different things like vmware tools heartbeat (and again, it would mean the OS is up and the services are running, as VMware tools is one of those services), network ping (so you can verify the network stack is up), and you can also run scripts of any kind against the applications. I’ve seen customers doing amazing things like a sql query that goes into a MS SQL database and retrieves a table, so they can be sure that the database is up nd running, it’s answering on the correct port, accepts the user used into the script, and return the correct data.

Obviously, this is a configuration that can and should be done for those critical VMs, your “preferred pets” that you care about the most. You go and configure a dedicated Application Group, a dedicated SureBackup job, probably you also have configured a dedicated Backup job to protect those VMs in the first place, as they may require specific settings like VSS, credentials, encryption and so on.

But out there, there are many other VMs in your environment, and managing all of them one by one is crazy.

On the backup side, you can automated all the backup activities by using tags for example. I’ve explained how to do this in a whitepaper I wrote: Using Veeam and VMware vSphere Tags for Advanced Policy-driven Data Protection.

But what about SureBackup? You can easily backup 1000 VMs per day, actually we have customer protecting environments at least with one more zero in the number. SureBackup requires just a spare hypervisor server to power-on the virtual machines from the backups, but then if you have a dynamic environment, with VMs coming and going, you would have to re-program surebackup jobs each time to be sure that each VM is tested. It’s not a problem of available resources, but more of automating the tests as much as possible, even if the environment changes frequently.

Dynamic SureBackup

I developed this idea few months ago when discussing a Veeam deployment with a large customer (yes, I work on the field from time to time :)). He loved since the very beginning the idea of SureBackup, because it would have been a completely automated solution for the internal audit that requires, among other things, that backups and replicas are tested periodically, and the results verified.

Hey, SureBackup can do it! But for 1000 VMs with different settings, and many of them new every month?

Here’s where the idea took form. After removing from the list those 30-40 “special VMs” that I talked about before, we ended up with around 1000 “regular” VMs. For those VMs, we designed something like this:

– each day, every VM is protected with a daily backup job
– after backups are complete, a new SureBackup job is created via powershell
– the job randomly select a certain amount of VMs that don’t have the “special” tag applied to them
– with those VMs, the script creates a new Application group
– the Application Group is tested with a SureBackup Job
– after the test is over and the report has been sent, Surebackup Job and Application Group are deleted
– on day 2, the same procedure is repeated

You probably see where I’m going. Each day some random VMs are tested. On the short run, it may happen that a given VM is tested 3 times, and some are never tested. But here is where the improvement I’ve received from Hans comes into play: an hash table is created on the first execution, or updated in the following days. Once a VM is tested, it receives a value of 1, every other non tested VM has a value of 0. At each execution, the script only selects the VMs to test among those having a value of 0, that are those still non tested. If you let the script run for a long enough period of time, each VM will be tested, and then the hash table will be reset, and the tests will start again from scratch. The minimum required time to rotate in the job 1000 VMs is 50 days if you run 20 VMs per day, so less than two months; if you have an internal audit happening every 6 months like in my use case, we are now sure that by the deadline, all VMs would have been tested.

So, here is the script. As usual, people with better Powershell skills than me can go and improve it at will as Hans did, and maybe post here their changes so I can integrate them in my original script:

Let’s explain what the script does:
– first of all, the job looks for an existing and configured virtual lab, that you have to configure in advance. The lab needs to be called “Virtual Lab 1”, or you can change its name
– the variable “NumberofVMs” defines how many VMs you want to test in each run. Change the number at will
– then, it retrieves the list of VMs that were successfully backed up during the last 30 days (adddays -30, you can change this) and add them to the hashtable with a value of 0. This is because we do not want to test a VM that doesn’t have a consistent restore point, but also avoid to reset the hashtable if for any reason a backup is not executed for one day. Also, with this value you can test VMs that are only saved once a week for example
– then, out of the list of untested VMs, it select the “NumberofVMs” virtual machines
– the job rebuilds each time a new application group and inserts the selected VMs in it
– the job is executed, and by default it only tests VMware tools. This is because each VM can be different, so it cannot do specific advanced tests like TCP ports or services. For those VMs, better to configure a custom SureBackup job
– if Veeam is properly configured for email notifications, each job will send out the results, so the results can be parsed over a long period of time and obtain statistics
– each tested VM is modified in the hash table with a value of 1. So, as long as there are VMs with value 0, a VM will not be tested twice before each other VM is tested
– if no more VMs exist in the hashtable with a value of 0 the hashtable is reset. This also happens if there’s no xml file, so if you delete the xml, the process will start from scratch

Finally, The hashtable also performs a few checks in order to keep itself up to date with the restore points available in Veeam. New VM entries are automatically added and old ones are removed from the hashtable. For example when a VM is deleted, there will be a point in time when there are no more “recent restore points” available in Veeam from that VM. The hashtable will need to be updated to the new status of Veeam. If a new VM is deployed and backed up by Veeam after the hashtable was created, it will need to be added to the hashtable.

5 thoughts on “How can you test 1000 VMs with Veeam SureBackup?

  1. Luca, having application group with 20 VMs is not good idea.
    You have 20 VMs powering on one by one and at the end all 20 are working.
    I bet if you create SureBackup with linked job containing 100 VMs and set it to check 10 of them simultaneously, it will finish faster and put less stress on target host.

  2. is there any way to start more than one server at a time? and to also enable verification?

    • You can start more than one VM at a time bu using linked jobs instead of application groups. For testing, again use linked jobs, the basic idea of this script is to verify VMs with a common set of properties like correct boot and network. For more custom checks you surely need dedicated surebackup jobs.

  3. Been using this and is very good at what it does……. one thing id like to try and add is when a back up is verified and the “1” is written is they a way I could then post the date it was done to a VM Tag.
    So for example the script runs on DC1 verifys the back up and writes to the hashtable, it then writes to a vm tag the date it was verified?

Comments are closed.