Sunday, 3 March 2013

Cold Storage

Over the years I've had quite a few hard drives fail on me. Fortunately, most of the time they've failed gracefully allowing me a short window in which I've been able to rescue their contents. In other words I've been exceedingly lucky not to have lost anything important, especially as I'm not particularly good at taking regular backups. The one thing I do tend to backup quite well are the digital photos we take.

My photo backup routine is simple. Firstly the photos end up on both my computer and Bryony's so we each have a separate copy. I also burn all the photos to two DVDs. One DVD I keep in the house and the other I periodically ship off to my parents for an off site backup. The off site backup is probably the most important, as while the copies in the house allow me to recover easily from a hard disk failure they don't insure me against a catastrophic incident at the house; fire, flood etc. Unfortunately DVD-R disks (and this also applies to CD-Rs) don't last forever.

As I've mentioned before we are trying to have a good go at tidying the house and getting rid of things we really don't need. Over the last 15 years or so I've burnt probably thousands of CD-Rs and DVD-Rs and most of the data on them is no longer need -- a lot of them contain installers for old versions of software or backups that have since been superseded by newer backups etc. Before throwing any of these disks away though I need to go through each one to check what was on them. So far I've managed to amalgamate around 50 DVD and CDs down into a single DVD, which is quite a good space saving. Unfortunately this hasn't been quite as easy as I'd expected as some of the disks had developed bit rot. Fortunately, with the help of ddrescue, I managed to extract everything I wanted from the damaged disks.

Some of the damaged disks I had to process were only around five years old and if they had been left much longer may not have been readable at all. Now while I've never bought the most expensive of blank CD-Rs or DVD-Rs I had expected that they would last for more than five years. Clearly if I want to ensure that my digital photos are still safe in ten or twenty years, as well as for future generations, then relying on DVD-Rs as the final backup probably isn't very wise. The solution, I think, is cold storage.

As I mentioned recently I've just started work on a new project called ForgetIT. ForgetIT, is essentially, an archiving project, with the interesting research focusing on incorporating human inspired models of memory and forgetting to ensure that we archive enough extra content to make the information understandable years after it was archived. In discussion one thing that kept being raised was where we intend to archive data to, and of course the current buzz word is The Cloud. While there are many possible online storage solutions (Google Drive and Dropbox being two popular services) they can all start to become expensive quickly. For example, with Google Drive you can store 5GB for free, but then an extra 25GB will cost you $2.49 a month and I've already got approximately 100GB of photos I want to store. These services are also designed around frequent and fast access; they are designed to act as a remote hard drive. This can be useful for syncing files across devices, but isn't necessary for long term archival storage. Cold storage takes a different approach to storage.

The idea behind cold storage is that it is for data you will want to access very infrequently, if ever. This approach allows service providers to take a different hardware approach (possibly turning off hard drives when they aren't needed or using tapes). This allows the cost to be reduced at the expense of access times. I think cold storage is perfect for the backup of last resort for photos as hopefully I'll never need to access them and if I do I won't need instantaneous access. I can also see this form of archiving being useful to companies for auditing or legal purposes where documents have to be be kept for a number of years but may never actually be requested.

Last year Amazon started to offer a cold storage service called Glacier. The headline information is that storage costs are just $0.01 per GB per month but that retrieval requests take four hours before you can start downloading your data. This means that I can store my current 100GB of photos for just $12 a year, and should I ever need to access them I won't mind a four hour wait. Before you all start rushing to copy your backups to Glacier there are a couple of points I've glossed over that need raising.

Firstly Glacier is part of the Amazon Web Servies (AWS) framework. AWS is a set of web services designed for developers and as such Glacier doesn't have an Amazon provided end user interface of any sort. Fortunately a number of people have started building applications. Some are command line tools which are great for scripted backup solutions, but there are also a few graphical interfaces. I've discussed these in more detail over on one of my other blogs where I'm currently recommending SAGU (Simple Amazon Glacier Uploader) although I'm also developing an application which I'm calling CORPSE (or COld stoRage backuP SoftwarE).

The more important issue is the cost. While the headline price of $0.01 per GB per month is certainly enticing it's worth noting that you are also charged for retrievals (although not for uploads). Firstly there is a per GB cost for transferring data out of Amazons servers (this is a standard AWS data out charge) which is free for the first GB each month but is then charged at $0.120 per GB (assuming you retrieve less than 10TB). This cost is at least easy to calculate. The more worrying cost is for the retrieval.

The retrieval pricing structure is really quite complex. Essentially "you can retrieve up to 5% of your data stored in Glacier for free each month ... [which] is calculated and metered on a daily prorated basis. For example, if on a given day you have 12 terabytes of data stored in Glacier, you can retrieve up to 20.5 gigabytes of data for free that day (12 terabytes x 5% / 30 days = 20.5 gigabytes, assuming it is a 30 day month)." This means, of course, that if you are willing to spread your retrieval out over a number of days/weeks/months/years then you can retrieve as much as you want without paying a retrieval fee. If you need your data quicker though then you will pay for the retrieval.

Calculating the amount you pay is complicated and I can't really summarise it without simply reproducing Amazon's examples so I suggest you go and read the details they have given. Their examples can, however, be summarised in a table. Essentially assume you have stored 75TB of data, so you're daily free limit is 128GB, and you want to retrieve 140GB. Depending how you spread out the retrieval changes how much you pay, as follows:

HoursPeak Retrieval RatePeak Billable Retrieval RateCost
435GB per hour3GB per hour$21.60
817.5GB per hour1.5GB per hour$10.80
28within your daily allowance$0.00

As you can see, as you spread out the retrieval over time the price drops to the point at which the cost for retrieval becomes $0. The problem is going to be keeping track of your downloads over time in order to try and estimate what a retrieval will cost. The best calculator I've found for this so far is unfortunately unofficial but seems to match up with Amazon's examples so should be a good starting point for estimating your costs. Of course my use of Glacier is predicated on the fact that I'm hopping never to need to access the data as it is the backup of last resort.

Even if you don't like the idea of cold storage, or Amazon's pricing structure, then I hope that this blog post will at least have made you think about when you last took a backup of any data you wouldn't want to lose and, more importantly, how long those backups are going to last for.
4 March 2013 05:21 , GB said...

Well this is an interesting topic. I'd never heard of bit rot. I've now read the article. I never cease to be amazed at the things I would never have imagined. I am fanatical about backing up. I back up my computer using two separate systems every day. I also keep a third less frequent backup off site. When I leave the house one of the current backups travels with me and the other is put in a relatively safe place in the house (in the UK in the safe). The total backup of the computer is less than 750GB (the size of the laptop's hard drive). However I have over 400Gb of photos so far and over 200Gb of music so I have other hard drives as well. The problem is keeping track of everything. Now that I know about bit rot though I will have to re-think my current storage.

4 March 2013 06:46 , Mark said...

If you are backing up to hard drives which appear to keep working each time you add something to them then you are probably quite safe, especially as you are making two copies, as this allows you some flexibility to deal with bit rot as it is unlikely that both copies will be damaged in the same way at the same time. I'm also assuming that your backup is generated by Time Machine on the Mac so any damage to the backup should be spotted as a change to the data which would cause it to be backed up again (this is because it is only backing up changes to files and not the entire drive everyday).

The one thing about Glacier that I didn't mention is that apparently it also stores duplicate copies of the data so that internally it can also monitor for bit rot which should ensure the backup stays safe.

4 March 2013 07:50 , GB said...

Thanks for that. I make one daily Time Machine and one Carbon Copy Cloner. Every so often I make a second Time Machine. I never use the 1TB WD Elements drives for more than a couple of years and then they go into retirement with a clean copy on. That's something I shall have to review although after a while they will be redundant anyway. It's not like your case where you are storing masses of important work data.

5 March 2013 17:25 , Scriptor Senex said...

I back up to three portable hard drives. Two are kept at home and Helen and Ian have the other for off-site storage and I update that every couple of months so that is the most I would lose if the house went up...

5 March 2013 17:40 , Mark said...

Just like GB as long as when you do the offsite backup the disk appears to work okay then you should be alright. My problem is that the offsite backup is a bunch of DVD-Rs which unless I try reading from them I won't know if they have rotted or not.

The problem with any offsite backup is that often you don't know if it's safe and useable until it's too late. One of the reasons I quite like Glacier is that checking the data (known as fixity checking) and duplicating the data to allow recovery from corruption is all handled transparently.

Post a Comment