UPDATED 22:50 EST / AUGUST 26 2012

Breaking Analysis: Amazon’s Glacier is the Titanic of Deep Archives for Corporate CIOs

Beware the Pitfalls of the Glacial Mass[/caption]

Amazon’s Glacier is a new service for infrequently accessed data. It is very inexpensive to load data into Glacier and park it there indefinitely but CIOs should use extreme caution before placing their enterprise archives in Glacier. Getting data out of Glacier could be exceedingly expensive. More importantly, however, Amazon’s less than adequate response to previous instances of data loss should be a source of concern for CIOs. While Amazon is providing an 11 9’s SLA based on an annual “durability” metric, it is unclear how or if this metric translates to data availability. Despite their infrequent access, corporate records and archives are critical assets that warrant care and Glacier is too risky a bet in which to store corporate records and critical archives.

Amazon Glacier: Inching into New Territory

Amazon Glacier is a new RESTful web service designed to store data that is least frequently accessed. Amazon Glacier is cheap at $.01/GB/month and according to Amazon “provides secure and durable storage for data archiving and backup.”

But as the saying goes…”backup is one thing—recovery is everything.”

To wit: Last year, in a fairly well publicized data loss incident, Amazon sent this email to one of its customers, which became the poster child of how not to handle a cloud outage:

“Hello, A few days ago we sent you an email letting you know that we were working on recovering an inconsistent data snapshot of one or more of your Amazon EBS volumes. We are very sorry, but ultimately our efforts to manually recover your volume were unsuccessful.  The hardware failed in such a way that we could not forensically restore the data. What we were able to recover has been made available via a snapshot, although the data is in such a state that it may have little to no utility…If you have no need for this snapshot, please delete it to avoid incurring storage charges. We apologize for this volume loss and any impact to your business.

 Sincerely,
Amazon Web Services, EBS Support.



 The email might as well just say: “We’re Amazon. We’re innovative. We’re cool. Our SLA? How about this–> We’ll do our best and if we fail, please email us.”

While the probability of this type of data loss happening to your organization may be low, CIOs must ask is it worth the risk? Unless Amazon puts its SLA money where its mouth is—which it has never done– the answer should in most circumstances be “no way.”

What’s Behind the Glacial Sheet?

It is unclear how Glacier works. Speculation is that Amazon stores stale data on high capacity disks and spins down the devices. Others have posited that Amazon uses a very low spin speed, high capacity disk drive that is custom built by Seagate. These devices are supposedly placed in a rack with custom logic that manages the devices and evidently only a small portion can be spun at full speed. Accessing data can take up to five hours because it is believed the data must be moved to conventional storage to be accessed by clients.

If the devices are spun down it’s a cause for concern as most enterprise drives were made for continuous operation. Remember MAID – Massive Arrays of Idle Disks? They never caught on primarily because they lacked a compelling use case but they never had the market adoption to really be field-tested. If you want to do a test of inconsequential data with Amazon Glacier, go for it, but as one Wikibon practitioner put it:

“You have to ask yourself – if it were free would I take it? In my case, because we’re so highly regulated, I’d say no, I do not trust giving my archives to Amazon or frankly any public cloud provider.”

CIO Considerations

There are certainly cases where Amazon Glacier makes sense. If you’re a company that doesn’t have an archiving capability and you need a cheap place to put data that you don’t care about then Glacier might be a fit. But ask yourself how you’ll feel if you never see some of that data again. If the answer is I don’t care or I’m totally willing to take that chance then Glacier should be a consideration.

Otherwise, there are six considerations that Wikibon is putting out to its CIO community regarding Glacier.

1.    Complexity.

Amazon marketing says you can reduce complexity with Glacier but be careful. If your organization has a deep archive process in place – either on tape or with a service provider then why fix what isn’t broken? Amazon claims Glacier can eliminate manual and cumbersome processes – which may be true but the company’s marketing fails to give you the full picture. Reading the Glacier Developer Guide however can provide further insights. For example, the guide states the following:

“Amazon Glacier provides a management console. You can use the console to create and delete vaults. However, all other interactions with Amazon Glacier require programming. For example, to upload data, such as photos, videos, and other documents, you must write code and make requests using either the REST API directly or the AWS SDK for Java and .NET wrapper libraries.”

Programming brings more complexity. If you’re a developer or an organization with spare development resources – not a problem. But if you’re like most companies this could be an issue and asking organizations to write to yet another API is more than a nuisance.

2. Data Asset Value

As stated earlier, corporate archives of publicly traded companies, government organizations, non-profits, etc. are a business critical asset that has resided on premise on tape or disk for years, or in the hands of proven tape archival specialists like Iron Mountain. Handing hundreds of terabytes or even petabytes of archived data over to Amazon should not be a top priority for Fortune 1000 CIOs.

3. The Allure of the Cloud

Enterprises will shift to cloud-based archival, but private clouds are the most likely and advisable deployment scenario. Petabytes of data shifting from a customer archive to Amazon’s public data centers are a huge leap of faith. For this class of data, large enterprises are more likely to move from tape or disk storage to a private cloud. One reason is the bandwidth costs of trying to move petabytes of data and secondly is the additional security of an on-premises cloud. Examples like USC, Cerner and NYSE underscore firms that went private for their petabyte-scale cloud storage archives.

4. Hidden Cost of Retrieval

Glacier pricing is unpredictable. Corporations who archive petabytes of data are used to predictability with their pricing. A recent article on Wired highlighted that “the way the peak hourly retrieval is calculated is a mystery. If the price is based on how long it takes you to download the archives, then the cost is limited by download speeds. But if the cost is based on how much you request in an hour and you request a large file that can’t be broken into chunks, the costs could skyrocket. For example, a 3 terabyte archive that can’t be split into smaller chunks could lead to a retrieval fee as high as $22,082 if the peak usage is determined to be 3 terabytes per hour. The cost of requests is separate from the cost of bandwidth to download the data, which has its own separate pricing table. “If you wrote an automated script to safely pull a full archive, a simple coding mistake, pulling all data at once, would lead you to be charged up to 720 times what you should be charged.”

5. Security – The Evil Cousin of the Cloud

One point not many folks are talking about is the fact that Amazon controls your encryption keys. Global corporations should not rely on Amazon to encrypt their data and hold their encryption keys hostage. By refusing to tell customers where data centers are located and providing security on their terms—not necessarily based on the edicts of an enterprise customer—Amazon Glacier will not make it past the security auditors of the Fortune 1000.

For web upstarts and relatively young companies who have never implemented an archive in the first place, Glacier is probably a good first step in that direction. An archive for Pinterest? Sure. An archive for the Fortune 1000? Not sensible.

6. Last Resort Disaster Recovery

Everyone is constantly saying tape is dead. Tape is not dead in the enterprise for this simple reason. In the catastrophic worst-case scenario…where your deep archive is the last resort – the absolutely fastest and cheapest way to move data to a recovery site is to load a truck up with tapes. Mainframers call this “CTAM – the Chevy Truck Access Method” and it provides cheaper bandwidth and more data movement capacity than any alternative. Because this “Red Button” scenario exists in most large enterprises they won’t likely turn to Glacier except in shadow IT cicumstances.

For these reasons CIOs should not consider Glacier as a core archiving business capability. Rather Glacier should be used as an experimental proving ground for data that is of no use. In all fairness to Amazon it is changing the game and defining the very notion of cloud services. It offers compute and storage that are “good enough” for small businesses, developers and the rogue shadow IT organization that just want to get stuff done. But for highly scrutinized organizations with CIOs accountable to shareholders – “Good Enough” just isn’t.

 


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU