Cloud storage? It’s all about compute

0 Flares Twitter 0 Facebook 0 LinkedIn 0 Email -- 0 Flares ×

One of the fastest growing business in the service provider space is with any doubt BaaS: Backup as a Service. This happens for two main reasons: it’s a much needed service for several companies looking for a cheap and easy way to consume an offsite location for their backups (thus easily fulfilling the 3-2-1 rule), and it has an immediate return on investment for the service provider offering this service: since for 99% of the incoming customers the primary bottleneck will always be the network connection between them and the service provider, the latter can optimize costs by using cheap storage solutions without impacting the overall performances of the service.

Everyone working with data immediately understands the importance of having a proper and effective data protection strategy of those data: corruption, deletions (by accident or on purpose), non authorized modifications, every change activity on data needs to also preserve the previous state of the same data, so that is always possible to revert them to their previous state.

But even more important, a modern data protection strategy can only work if every data is properly protected, but it also needs to be easily and quickly recoverable, in order to actually fulfill the requests of stakeholders for almost immediate restores. Instead of setting up expensive secondary sites that drain money for both the equipement and its managament, companies are looking to service providers for their data protection solutions. This increasing demand has created a new and exciting market, where service providers are offering a multitude of solutions. But right because there are so many solutions, customers looking around need to select them really carefully, and understand if those are really the best choice for them. BaaS solutions are so easy to setup and operate, that sometimes service providers only compete on price, and don’t put enough care on quality and service levels. Most of the available solutions are nothing more than simple sync&share, file copies with versioning of every file that a company (or an individual) needs to protect.

When it comes to virtualized environment, and by consequence enterprise-grade solutions, these offers start to show their limits. And the biggest of them is exactly the fact that backups are managed just like huge files from a service provider point of view. Think about this: a full copy of your photo folder is really simple to create, you can do it by yourself by simply copy-pasting photos into a remote storage. That’s exactly what those simple solutions do. But when it comes to enterprises and companies in general, data are most of the times in different formats: not only files, but also e-mails, shared projects, active directory objects, database tables. And most of the times, those data are inside a virtual machine. Image-based backups save entire virtual machines, and the needed items to be restored are inside those VMs. A simple backup done with a file copy is not effective at all in these situations, because once the backup file is “in the cloud”, to first start a restore the entire backup file itself needs to be completely retrieved locally before it can be opened. And to add another pain point, data are skyrocketing, and any data protection activity involves an increasing amount of data.

What’s needed to overcome these limits is compute. Compute capability applied to data availability, both on-premises and in the cloud, is the best solution to be able to get that amount of data into the cloud, and retrieve it when needed. On the way out of your infrastructure, you want compute to be able to run optimization techniques to outgoing data. As data is inside a virtual machine most of the times, you don’t want to waste time to have agents inside any server; modern technologies are able to work at the hypervisor layer and create image-based copies of those virtual machines. But because of their increasing size, you want to leverage deduplication as much as possible, so that your precious bandwidth towards the chosen cloud solution is as much optimized as possible.

A solution using local compute resources is only part of the answer; you also need compute capabilities at the service provider. Again, in a sync&share solution the service provider is only giving you a large storage space where you drop your data, search inside them and retrieve what you need. Indexing and searches are done remotely by the client component. But what if you need to restore a single email from the remote copy of your entire mail server? Do you really want to download the entire backup of your email server just to open it and extract one single email? That’s what happens most of the times, and it becomes even worse when you add encryption to the picture: without compute capabilities, the service provider cannot open encrypted backups but can only send them entirely to you so “your” compute systems can decrypt them, browse its content, and finally restore the needed item. If a backup file is some hundreds GBs in size, you first need to retrieve it completely before being able to open it. Your RTO performances are going to be abysmal.

Is it what you want? I don’t think so. Again, you need compute at both ends. With a solution capable of opening the remote copy directly at the service provider side, regardless it’s encrypted, compressed or deduplicated, you are able to browse its content without actually moving any data over the wire, because the interaction between the two compute components at the two ends of the communication will do it for you. Compute actvities happen locally to the backup files location, at the service provider, and only when the needed block of information is identified, your local compute resource receives the minimum amount of data you need to complete the restore. And if it’s an application item instead of a file, the solution at the service provider needs to be able to understand the content of those remote copies: read a microsoft exchange database, an SQL table, an active directory, and so on. It needs to have some smart technology, so it can leverage the remote compute capabilities, and save you bandwidth, but most of all time. After all, why you’d like to delegate your data availability to a service provider, if then backup and recovery times are going to be worse to the point where money savings alone cannot justify the increased RPO and RTO times? With a solution able to leverage compute together with cloud storage, you can effectively improve these values instead of worsening them, and even more at the same time you are adding additional capabilities to your solution like application items restores.

By the way, this is one of the design principles behind Veeam Cloud Connect: for a powerful cloud backup solution, compute needs to be as close as possible to data in order to be effective. Even in the cloud.