Disclaimer: after these upcoming articles about this technology, Moresi.Com SA, the company I work for, signed an agreement with ExaGrid and we are actually Value Added Reseller of ExaGrid, with the further authorization to directly purchase and sell without any distributor all over Switzerland and Italy.
However, I had no obligation with ExaGrid to write these articles, and they do not checked or reviewed these articles before I published them, except for a technical validation.
Virtualization and Backup issues
As we all know, virtualization is a great solution for improving IT agility and efficiency. One of the issues we all need to deal with in a virtual environment, and specifically VMware, is how to backup virtual machines and their data.
New software solutions did came out in the last years, specifically crafted for VMware, as VMware Data Recovery, Veeam Backup & Replication, or Quest vRanger. All of them give VMware administrators the ability to backup VMs using vStorage API rather than using old methods based on agents installed inside the Guest OS.
But, what about backup windows, and the consumed space on backup storage?
Well, VM sprawl (the side effect of having a great product like VMware, where admins create more and more VMs simply because is so easy to do it) leads to datacenters full of VMs. Even more, latest OS like Windows 2008 R2 have increased the minimum required space on disk.
All these issues lead to the need for more and more space for backups, while increased business operations usually requires shortest backup windows.
The first effect has already been seen: backup to tape is no more feasible, and new solutions are all based on disk backup. Software like Veeam have only the ability to send their backup to a network share. Disk backups are more reliable, and most important of all can guarantee faster restore times. We all know in fact that we do not do backup for the purpose of doing them, but “only” for restore them later.
Usually, the first solution every sysadmin tries to use is a NAS device: they can fulfill one of the major issue, the disk space. Using cheap SATA drives, aggregated in large RAID5 arrays, gives the possibility to have many terabytes of available space for backups, while keeping costs at a minimum, almost near to the price per Gb offered by tapes.
But what about speed and backup windows? A SATA-based NAS cannot be cheap and fast at the same time, and this problems grows as the number of VMs and the frequency of backups we want.
Deduplication
Is there a solution for all these issues?
Yes, and it’s called deduplication. It’s the ability to save only modified data since previous backups, saving on disk space and backup completion time.
But, there are issues even with this technology. Software solutions like VDR, Veeam and others, all rely on software on-the-fly deduplication. That’s the process: backup software does a first mandatory full backup; on the next run, it controls what block data have changed since the previous run, and copies only the differences. The process of deduplicating data differs from different software, some of them use CBT technology coming directly from vStorage API, others have other layers of deduplication and inspect the content inside data.
Whatever the details of specific technologies are, all these solution are cpu-hungry (take a look at hardware requirements of backup servers, sometimes needs to be more powerful of the application servers they are protecting…) and all this effort translates in long backup windows.
Hardware Deduplication?
Yes, and that’s what ExaGrid is all about.
I discovered this company some months ago, and from the first look at their solution I’ve been amazed. ExaGrid is a US based company, founded in 2002 with a unique product in their catalog, the ExaGrid appliance itself.
Their idea is at the same time simple (to explain) and I think difficult (to realize): backup software needs to complete backup operations in the shortest time it can, avoiding complex calculations about deduplication or compression.
Once the backup is completely saved on the ExaGrid, the ExaGrid itself has all the needed technology and hardware power to do “post-backup” deduplication.
Great, isn’t it?
Also, ExaGrid has more features and specs that make it wonderful:
– when you create a share for backup (via CIFS, NFS or OST for symantec) you declare what kind of data are coming to it. Say you select Veeam, ExaGrid will use specific algorithms to further optimize Veeam backups
– deduplication is made analyzing all the backups sent by the software, and at byte level. This guarantees the maximum level of deduplication, in fact ExaGrid states it can reach a level from 10:1 to an outstanding 50:1, depending on kind of data it receives
– the ExaGrid machine is equipped with Intel Xeon cpus, good amount of ram and SATA disks in raid6. These specs are completely different from any other backup NAS, and are tailored for the granting the power to do post-backup analysis. If you see one of them at work, you will see more activity “after” backup has been completed.
Internal post-backup deduplication does not involve network traffic or cpu activity on the backup servers, and can be done during the day.
Price/performance ratio
SATA disks aggregated in a Raid-6 and server-grade power sounds like something that does not comes cheap. But if you think about deduplication, economics change heavily.
With an entry model (EX-1000) with 2 Tb of usable space at 15.000 usd, raw space will costs 7.32 dollar per Gb, way more than a tape. But if you put in the maths the deduplication, this machine is something that can hold about 20 Tb of data, even considering the least deduplication ratio of 10:1. In this way, cost per Gb is in reality 0,732 dollar per Gb.
Backup speed
Other deduplication appliances has a hardware design based on SANs. In few words, one or two Storage processor with a huge amount of disks. In this way, while you add space adding disks, speed of the storage processors remains the same, so when you have more VMs to save, backup windows grows instead of shrinks.
ExaGrid, as its name suggests, uses a different approach, based on a grid.
A single ExaGrid machine has a fixed amount of disk space, but you can grow your environment installing other machines, mixing different models. In this way, you get two advantages:
– first machines are still usable, so you do not have to throw them away
– every machine can get backup files at a certain speed: install two of them on the network, and you have simply doubled the ExaGrid total speed.
Let’s play!
I contacted ExaGrid guys because I wanted to test their technology, so let me use this space to say a huge “Thanks!” to them, in particularly Graham Woods, Director of System Engineers, EMEA.
They were enthusiastic from the first time and they supported me in every aspect: we had long phone sessions talking about technology, configurations, best practices, and they ship to me a EX-1000 to test it for some months without any problem.
We will go deep on technology in the next articles I’ll write.
Stay tuned!