My photo backup routine is simple. Firstly the photos end up on both my computer and Bryony's so we each have a separate copy. I also burn all the photos to two DVDs. One DVD I keep in the house and the other I periodically ship off to my parents for an off site backup. The off site backup is probably the most important, as while the copies in the house allow me to recover easily from a hard disk failure they don't insure me against a catastrophic incident at the house; fire, flood etc. Unfortunately DVD-R disks (and this also applies to CD-Rs) don't last forever.
As I've mentioned before we are trying to have a good go at tidying the house and getting rid of things we really don't need. Over the last 15 years or so I've burnt probably thousands of CD-Rs and DVD-Rs and most of the data on them is no longer need -- a lot of them contain installers for old versions of software or backups that have since been superseded by newer backups etc. Before throwing any of these disks away though I need to go through each one to check what was on them. So far I've managed to amalgamate around 50 DVD and CDs down into a single DVD, which is quite a good space saving. Unfortunately this hasn't been quite as easy as I'd expected as some of the disks had developed bit rot. Fortunately, with the help of ddrescue, I managed to extract everything I wanted from the damaged disks.
Some of the damaged disks I had to process were only around five years old and if they had been left much longer may not have been readable at all. Now while I've never bought the most expensive of blank CD-Rs or DVD-Rs I had expected that they would last for more than five years. Clearly if I want to ensure that my digital photos are still safe in ten or twenty years, as well as for future generations, then relying on DVD-Rs as the final backup probably isn't very wise. The solution, I think, is cold storage.
As I mentioned recently I've just started work on a new project called ForgetIT. ForgetIT, is essentially, an archiving project, with the interesting research focusing on incorporating human inspired models of memory and forgetting to ensure that we archive enough extra content to make the information understandable years after it was archived. In discussion one thing that kept being raised was where we intend to archive data to, and of course the current buzz word is The Cloud. While there are many possible online storage solutions (Google Drive and Dropbox being two popular services) they can all start to become expensive quickly. For example, with Google Drive you can store 5GB for free, but then an extra 25GB will cost you $2.49 a month and I've already got approximately 100GB of photos I want to store. These services are also designed around frequent and fast access; they are designed to act as a remote hard drive. This can be useful for syncing files across devices, but isn't necessary for long term archival storage. Cold storage takes a different approach to storage.
The idea behind cold storage is that it is for data you will want to access very infrequently, if ever. This approach allows service providers to take a different hardware approach (possibly turning off hard drives when they aren't needed or using tapes). This allows the cost to be reduced at the expense of access times. I think cold storage is perfect for the backup of last resort for photos as hopefully I'll never need to access them and if I do I won't need instantaneous access. I can also see this form of archiving being useful to companies for auditing or legal purposes where documents have to be be kept for a number of years but may never actually be requested.
Last year Amazon started to offer a cold storage service called Glacier. The headline information is that storage costs are just $0.01 per GB per month but that retrieval requests take four hours before you can start downloading your data. This means that I can store my current 100GB of photos for just $12 a year, and should I ever need to access them I won't mind a four hour wait. Before you all start rushing to copy your backups to Glacier there are a couple of points I've glossed over that need raising.
Firstly Glacier is part of the Amazon Web Servies (AWS) framework. AWS is a set of web services designed for developers and as such Glacier doesn't have an Amazon provided end user interface of any sort. Fortunately a number of people have started building applications. Some are command line tools which are great for scripted backup solutions, but there are also a few graphical interfaces. I've discussed these in more detail over on one of my other blogs where I'm currently recommending SAGU (Simple Amazon Glacier Uploader) although I'm also developing an application which I'm calling CORPSE (or COld stoRage backuP SoftwarE).
The more important issue is the cost. While the headline price of $0.01 per GB per month is certainly enticing it's worth noting that you are also charged for retrievals (although not for uploads). Firstly there is a per GB cost for transferring data out of Amazons servers (this is a standard AWS data out charge) which is free for the first GB each month but is then charged at $0.120 per GB (assuming you retrieve less than 10TB). This cost is at least easy to calculate. The more worrying cost is for the retrieval.
The retrieval pricing structure is really quite complex. Essentially "you can retrieve up to 5% of your data stored in Glacier for free each month ... [which] is calculated and metered on a daily prorated basis. For example, if on a given day you have 12 terabytes of data stored in Glacier, you can retrieve up to 20.5 gigabytes of data for free that day (12 terabytes x 5% / 30 days = 20.5 gigabytes, assuming it is a 30 day month)." This means, of course, that if you are willing to spread your retrieval out over a number of days/weeks/months/years then you can retrieve as much as you want without paying a retrieval fee. If you need your data quicker though then you will pay for the retrieval.
Calculating the amount you pay is complicated and I can't really summarise it without simply reproducing Amazon's examples so I suggest you go and read the details they have given. Their examples can, however, be summarised in a table. Essentially assume you have stored 75TB of data, so you're daily free limit is 128GB, and you want to retrieve 140GB. Depending how you spread out the retrieval changes how much you pay, as follows:
Hours | Peak Retrieval Rate | Peak Billable Retrieval Rate | Cost |
---|---|---|---|
4 | 35GB per hour | 3GB per hour | $21.60 |
8 | 17.5GB per hour | 1.5GB per hour | $10.80 |
28 | within your daily allowance | $0.00 |
As you can see, as you spread out the retrieval over time the price drops to the point at which the cost for retrieval becomes $0. The problem is going to be keeping track of your downloads over time in order to try and estimate what a retrieval will cost. The best calculator I've found for this so far is unfortunately unofficial but seems to match up with Amazon's examples so should be a good starting point for estimating your costs. Of course my use of Glacier is predicated on the fact that I'm hopping never to need to access the data as it is the backup of last resort.
Even if you don't like the idea of cold storage, or Amazon's pricing structure, then I hope that this blog post will at least have made you think about when you last took a backup of any data you wouldn't want to lose and, more importantly, how long those backups are going to last for.