Yesterday in a mailing list we had a nice discussion about some “worst practices” we see at some customers about their backup configurations. They inspired me this post about some common-sense you can apply to your backups.
Separation
I often found environments where a customer totally relies on the redundancy and resilience of his new shiny SAN, where he configures an area for backup purposes. Among the fancy statements I heard there are “I’ve plenty of space there…” or “my SAN is completely redundant!”. Obviously those statements did not exclude the possibility something can go wrong (maybe even a “trivial” corruption in the parity calculation activities…) and that those errors can loose both production, and backup data from where we would eventually restore the lost data.
Moral: longest is the distance between production and backup data, the better. You usually start with a separated storage in the same rack, then you move the backup in another room, than in another building, until you create a copy of those backups in a remote site many km away. The ideal solution would be to have two copies, one locally to guarantee fast restore times, and the other one to protect you if something happens to your production site.
Avoid chinese boxes
When you are going to do a restore, you are “ALREADY” in a trouble; at the very least because someone has deleted a file. Are you really always confident you’ll be able to restore it? Don’t you feel that bead of sweat crawling on your forehead? IF you started your career in IT at a time when tapes where the only backup media available, you know what I’m talking about. So, why do you want to complicate your life adding layers that lay between you and your craved restore?
You have a NAS able to expose share via CIFS, and then you configure it as an iscsi target, then you mount it on a VMware ESXi server, then you format it with its VMFS filesystem, then you create on top of it a VMDK virtual disk, then you configure it as a VM disk that format it with NTFS, then your backup software will save here those same files you would have been able to save directly on the CIFS share, does not seems to you an useless complication? How many rings in this chain could break? What if you loose your ESXi server, the one able to mount the iscsi target? Think about it…
This example tells you also: keep it simple (and stupid, like the KISS acronym), do not use weird configuration only because you can.
Avoid dead ends
Chyper is really needed? What if the chyper has gone bad? What if you forget or loose the key?
Or maybe you saved everything in a opensource community edition filesystem, where someone “could” help you, if he is still in the project team, if that morning he wants to help you in their public forum, if he can understand what has happened. At least, using a commercial product means you paid a company to help you when something goes wrong.
As a general rule, stay away from all those situations where a single point can hold you in hostage.
Keep yourself up to date
Technology evolves, and your historical data usually end up staying inside dead gears. Don’t get stucked in situations where your chyper key is in a floppy written 5 years ago, and now none of your computer has a floppy drive. Or an old DDS2 tape without any tape drive around…
But also pay attention to software: it has happened at a customer to do a restore from a backup made with an existing software, but with a really old release, not supported anymore. We spent the whole morning trying to find somewhere a copy of the installation pack, before the vendor succeded in sending us a copy. And telling us “we found it in a cd inside a display cabinet, as a memory of that milestone release”.
We laughed lod once we restored the backup, but it was not funny at all during that morning…