I recently had the possibility to briefly test a Dell DR4000 deduplication appliance (you can read my quick review here), so after one year I tried DataDomain, ExaGrid and Dell. Apart from HP StoreOnce, I tried all the main deduplication appliances, and I understood how they work, pros and cons, but most of all the use cases for their implementation. This post is not about when to use them, but when NOT to use them.
Let me be clear: they are all great products, each with their strengths, and I think nobody will be disappointed by using them if they configure and use them in the right way. This post is not at all a bashing towards those products.
But (I know you were waiting for a but) there are situations where these appliances are far from being the best fit for customers looking for a space to save disk backups.
My doubts are about the small models of these appliances. Let me show you an example: think about a 2Tb deduplication appliance, whatever it is. All have a price around 20k euros, and offer 2Tb native storage with a “maybe” usable space up to 20x deduplication. I often use Veeam Backup for data protection of virtual environments, so I rarely saw 20x deduplication ratio, a more honest value would be somewhere between 10 and 15x, but let’s use anyway 20. This gives you a usable 40Tb storage.
Performances of the inline dedup models are not that great for obvious reasons (dedupe activities slow down write operations). Saving full backups to a local raid5 sas disks on a physical server has always shown me higher values, even configuring Veeam to be “dedupe friendly” when saving to those appliances.
If the write speed of an appliance is around 60-70 MBs (bigger models are faster), I can reach that speed even with a wide-striped sata raid coming out from a “prosumer” NAS (those with quad core CPU and a good amount of ram, like Qnap or Synology to name a few). And if you try to configure a NAS to have 40 Tb on board, you will find out the total price is less than 10k euros.
Same space, same performances, half the price.
And I suspect the non-deduped NAS can be even faster when doing Instant VM Recovery for example.
So, I ended up with a personal “rule of thumb” when designing disk backup solution for customer. The reasons to go for a dedup appliances in my opinion are: – you have more than 50-60 tb of backups to be saved. The more data you need to save, the more those products are a good fit – you need replication for offsite backups: replica features of dedup appliances are way better than common NAS (usually rsync), and can avoid to deploy a server with replication software in between to read data from the source NAS and save it to target NAS. Data is replicated in deduped state in those appliances, thus saving on bandwidth while with NAS or server software you have full size backups to be replicated – you need something that can scale in the future. The ease of scaling the deduplication appliance is different based on the chosen vendor (ExaGrid is a true scale-out system, other appliances uses added shelves to the same appliance) – you want corporate support and so you go for a mainstream vendor rather than “cheap off the shelf” hardware
If your requirements are among those listed here, go for a deduplication appliances. Else, think twice before choosing one of those, maybe only because your consultant simply says “you need deduplication to do disk backups”.