I’m starting today a new category on my blog, dedicated to Data Protection. I’ve talked since many years about data protection, backup and disaster recovery, but until today I never created a dedicated section. You will find in this category overview articles, some theory, and practical tests with several Data Protection solutions.
In this first article, I would like to talk about some fundaments of data protection, and specifically the “3-2-1” rule, and how you can apply it to Data Protection in virtualized environments.
First of all, a brief introduction about this rule.
I’m not sure who has been the first one to articulate it and when, but it’s been around in the IT industry for quite some time, and even if you can find several versions of it, you can summarize it perfectly with this picture:
The number 3-2-1 can be explained as:
3 copies of every data you want to protect 2 different storage media 1remote site.
The most common error you can find in a Data Protection plan is the presence of only one backup copy. This is an announced disaster, when you will have to restore a data lost in production and that data is missing or corrupted in the only backup copy you have, and then it is lost forever.
Instead, if you design your environment to have at least 3 copies of every data, the first one will be the production data itself, so to satisfy the rule (and to sleep without worries…) you will need to have two backup copies. If for any reason you loose the first copy, you will still have the second copy available, increasing the overall security of your data.
In a virtualized environment, where you usually save the whole virtual disks of your virtual machines, in order to save on space occupation as much as possible, it’s useful to take advantage of deduplication appliances. Also, if you are thinking about creating the second backup copy by duplicating the first one, a deduplicated copy allows for smaller data transfers when replicating it.
Finally, there is a choice you need to do about the preferred method to create the third copy: to duplicate the first bckup, or to create it directly from production data? There are pros and cons in both methods, let’s see them briefly:
– an indipendent creation of both backup copies from production data, eliminates the rick of a corruption inside the first backup. If you create the third copy by duplicating the second one, a corruption inside the data of the first backup would be replicated byte-by-byte into the third copy, erasing any benefit about having two backup copies
– on the other hand, with more and more data hosted into production, to create two indipendent backup copies means to read those data twice from the production storage. This means to double the read I/O in the production storage, and also to double the duration of the backup activities,thus increasing RPO values.
The most common solution is usually to clone the first backup, mainly for budget reasons. It’s obviously mandatory, because of the reasons explained before, have a series of control in place (for example scheduled restore tests…) to guarantee the integrity of saved data.
To use two different backup media protects the Data Protection plan from different problems a single media could have. Bubbling DVDs, demagnetized tapes, failed firmware upgrades on NAS, a list of potential things that could go wrong is endless. But if you are using two different media, the likelihood of a damage to both media “at the same time” is lower by far.
In the past, Data Protection systems were based exclusively on tapes, and the second media was not even considered. As disk prices has been lowered, and with the raise of deduplication appliances, the price to do backups on disks has been reduced too, and today you can (and you should) use disk as the first media. As said before, the amount of data in a virtualized environment is remarkable, so disk is at the end the only viable way to guarantee fast restore times of those huge amounts of data.
To partially recoup the expense for the backup storage, you can choose other media for the third copy. The best solution is anyway to use two NAS for both copies, since their read and wrte speed helps both backups and restores. If there is a corruption in the first media, a secondary NAS guarantees the same restore times for the third media too.
However you should bear in mind some aspects:
– if you choose the same NAS for both copies, usually because the vendor offers you a bigger discount than buying only one, be careful to manage them as they were totally different. For example, do not upgrade their firmware at the same time, or an error in that firmware would break both NAS, leaving you without any backup copy
– you should choose the same NAS only if they offer some asynchronous replica feature between them. Otherwise you are only accepting useless risks without any advantage
– if you choose two different media, probably the second one will be a tape one. Remember the huge speed difference between tape and disk when you declare your recovery time objectives (RTO), maybe you would be able to guarantee them only when restoring from disk
– if you are going to take some backup offline for historical purposes, and you are using tapes, remember to keep at least one tape reader available. I’ve seen customers saving tapes for many years, and then they do not have anymore a reader for those tapes at hand. LTO tapes are a good choice thanks to their backwards compatibility
– schedule a read test on those tapes to verify their are in good conditions, and eventually copy their content into a newer tape
– WORM systems like DC or DVD are not everlasting
– evaluate an online backup service as an alternative solution. It’s useful also to satisfy the third rule coming right now.
1 remote site
If you are a victim of a fire, or some other natural disaster or human error, you are maybe going to loose all your production data. To guarantee their recovery, it’s better that one of the two backup copies will be placed in a remote site, far enough for the primary site in order to not be affected by the same disasters.
Companies with multiple sites can use one of the other sites as a remote location for their backups; for those companies having only one site, an online backup service offered by a provider could be an effective and cheap solution. It saves the need to buy dedicated hardware, allowing for the payment of only the disk space at the remote location.
There are several providers offering those services, you need to choose carefully, and these are some parameters you may keep in mind:
– the possibility to encrypt your backups. Your data are going to be hosted in a remote system that is not under your control; encryption guarantees other customers or provider personnel could not read your data. You would then have to handle carefully your encryption keys, if you loose them you would not be able to open your remote backups anymore.
– proximity to your company: unless you are only saving your home directories, backups of a virtualized environment are hefty. To upload them towards the provider datacenter could become a daunting taks, so the possibility to ship an external disk to them with the first copy of your backups could save a huge amount of time when starting the remote copies
– this is even more important during restores! If you need to rebuild your infrastructure, you are not going to download all your data; you better take your car and drive to the provider site, and get a copy of your backup onto an xternal disk to be brought back to your company site.
– finally, have a look at the technical and financial health of those providers: many companies are starting new online backup businesses in these days, and the act as providers. You need to find out if that business is a core activity for that provider, or only a way to get new customers for their other services. You do not want to end up having a “dead” provider dismissing all its customers, and having to find again another service offer.
Data protection in a virtualized environment goes through the deployment of different solutions, each of them is mandatory, complementary and coordinated. To have a single backup of your data is not enough at all and expose you and your company to painful losses. Regardless the added complexity it introduces, the 3-2-1 solution is able to guarantee an adequate level of protection of your virtualized environment.