Freeze your backups into AWS Glacier

0 Flares Twitter 0 Facebook 0 Google+ 0 LinkedIn 0 Email -- 0 Flares ×

Amazon Web Services has recently added to its impressive lineup of cloud services a new storage solution, named Glacier.

As per their presentation:

Amazon Glacier is an extremely low-cost storage service that provides secure and durable storage for data archiving and backup. In order to keep costs low, Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable. With Amazon Glacier, customers can reliably store large or small amounts of data for as little as $0.01 per gigabyte per month, a significant savings compared to on-premises solutions.

Apart the speculations following the launch of Glacier, about the fact it is using custom-made disks, or tapes, the service is really promising. A true “Backup as a Service”, targeted to long term retention of data you do not need to retrieve frequently, and with a great price!

Problem is, Glacier is not accessible with the usual protocols like CIFS, NFS or even WebDAV. As many AWS services, you need to use their APIs to interact with Glacier. There is no way to upload or download backups via the web interface.

Hopefully, many guys out in the internet created several tools to access Glacier, and you can find both commercial and free programs, for many different operating systems.

In this article, I’m going to show you how I configured my local backups, made with Veeam, to be shipped to Glacier, how they performed, and some hardening on the Glacier permissions.

Create a vault in Glacier

Inside Glacier, a container for your backups is called Vault. You can create how many vaults you need, based on your requirements. Separating different backups in dedicated vaults is a good practice, both for retention policies and for security access to these vaults.

Other than creating and deleting a vault, there is nothing else you can do in the web interface. All the interaction with Glacier happens via APIs. The only activity you have to do is to copy the ARN (think of it as a URL to access the specific vault via AWS APIs), you will need it at a later stage.

Accessing Glacier in a secure way

Even if AWS has a complete role-based authentication system, many customers only use the main account. In my opinion this is a security risk: since AWS is a pay-as-you-go platform with no expense limit, if your account is compromised they have access to all your data, and they can use your services while you will have to pay for them at the end of the month.
A good security practice is “Least Privilege”: give a user/service the least privilege needed to complete the assigned task.
In AWS, the service to be used in this scenario is IAM (Identity and Access Management). You could even federate your local directory with IAM so you can use your local accounts to access AWS.

I’m going to create a new user dedicated to Glacier activities, via the IAM user interface :

Write down the security credentials or download them, they will be used at a later time and this is the only time you can retrieve the secret access key.
Once the user is created, I need to assign it a “role” so it can only access Glacier. In the permissions tab, select “Attach User Policy”. You can see there are many pre-defined policies, but at at the time of this article, IAM interface does not supports Glacier permissions, so I have to create them via a custom policy. Give the new policy a name, and enter this text in the policy document:

{
"Statement":[{
"Effect":"Allow",
"Resource":[
"arn:aws:glacier:*:XXXXXXXXXXXX:vaults/*"
],
"Action":[
"glacier:ListVaults"
]
}],
"Statement":[{
"Effect":"Allow",
"Resource":[
"arn:aws:glacier:eu-west-1:XXXXXXXXXXXX:vaults/skunkworks-glacier-test"
],
"Action":[
"glacier:UploadArchive",
"glacier:InitiateMultipartUpload",
"glacier:UploadMultipartPart",
"glacier:UploadPart",
"glacier:DeleteArchive",
"glacier:ListParts",
"glacier:InitiateJob",
"glacier:ListJobs",
"glacier:GetJobOutput",
"glacier:ListMultipartUploads",
"glacier:CompleteMultipartUpload",
"glacier:ListVaults",
"glacier:CreateVault",
"glacier:DescribeVault"
]
}]
}

As you can see, I inserted the ARN of the previous step in this text (I only masked the account code with several X), so I further limited the access only to this specific vault. If you want to access every vault, replace the vault name with a *.

Think about complex scenarios like a single company account managing several branch offices, each with its own vault; with this security configuration only the central IT department can access every vault, while each branch office can access only its own vault. Or you are e system integrator offering remote backup services. These are only two use cases where this configuration inside Glacier can be really useful.
If you need a “Glacier Admin”, this is the policy for the role:

{
"Statement":[{
"Effect":"Allow",
"Resource":[
"arn:aws:glacier:*:XXXXXXXXXXXX:vaults/*"
],
"Action":[
"glacier:*"
]
}]
}

Easy syntax, isn’t it? If you are interested in it, it’s written in JSON, you can read more about it here.

Send your backup to Glacier

Once Glacier is completely configured, it’s time to test some upload. After some tests, I found a great tool for this activity: FastGlacier. I’m not saying it’s the best one, but for my needs is perfect, since it runs via command line in Windows, so it’s really easy to write scripts invoking it. The program is free for personal use, but even the Pro version is really cheap, only 29 USD.

Once FastGlacier is installed and you run it for the first time, it will ask you to configure your AWS account:

Give the account a friendly name (maybe the same you used in IAM to create it), and then input the security credentials you saved before. If you choose in the dropdown list the correct Amazon Zone (eu-west-1 in my case) you will see the vault:

From here you can manually upload and downloads files. Since I want to do it automatically, we need one more step.

FastGlacier offers also a command line executable, named glacier-put.exe, that you can use inside your script. The syntax for the command is:

glacier-put.exe account-name local-file region-code vault/folder

where account-name is the account I saved in the GUI. You can obtain the codes for the correct AWS region by running the command without arguments. In my example I’m uploading a 9 GB Veeam backup file to my vault

so my command will be:

glacier-put.exe skunkworks-glacier-test BKP-fileserver2012-11-17T220148.vbk eu-west-1 skunkworks-glacier-test/

The command starts uploading the file to Glacier, and you can see the progress directly in the command shell:

One last warning: depending on your internet connetion, your upload task could last several hours. If your backup file is going to be modified in the next execution of your backup (in my example Veeam is running in reverse incremental mode, so the backup files will be modified for sure) be sure you can finish the upload before the backup starts, otherwise copy the file to be uploaded in a secondary directory.
My upload started at 13:42 and finished at 16.47, 3 hours to upload 9 Gb, at an average speed of 0.8 MB per second; take your time to do some tests before using it in a production environment.

0 Flares Twitter 0 Facebook 0 Google+ 0 LinkedIn 0 Email -- 0 Flares ×

2 thoughts on “Freeze your backups into AWS Glacier

  1. Let’s say you glacier-put a large VBK file, and then another backup runs and the file changes. The name of the VBK changes, and some of the data in the VBK changes (essentially the delta since the last VBK).

    If you glacier-put the new VBK file, does it only transfer the block changes since the last VBK file, or is the entire file uploaded?

    Some cloud storage tools we’ve tested seem to be smart enough to send only the block changes (but they are also configured to monitor an entire folder, so individual file names are not passed to the tools).

    Sending the entire VBK contents up nightly would require more bandwidth than our customers typically have, so we often must find solutions that only send block differences.

  2. Hi Mark,
    I have no evidence FastGlacier can do block-level incrementals, so I must assume is send the whole file everytime.

    Anyway, you need to better understand how Glacier works, and you will find out your concern is useless :) Let me explain: on Glacier you send files to an object storage, not a real storage server. You cannot interact with it as you would do in a shared folder. To update the previous backup file, you would have first to retrieve it, and this activity per Glacier SLAs can take up to 4-5 hours to be completed.

    So, as I wrote in the article, the best use case for Glacier is long term retention of rarely accessed backups. You choose to upload to Glacier a backup every week or month, not every daily backup. If you need this kind of frequency, better look at other storage services like Amazon S3.

    Luca.

Comments are closed.