UPDATED 12:00 EDT / JUNE 11 2014

Public clouds befuddled by SLAs? Nonsense!

Clouds over Puerto Rico (photo by Paul Gillin)

Bert Latamore wrote recently on SiliconANGLE about Qubole’s Big Data as a Service (BDaaS) offering and echoed some points made by Jeff Kelly on Wikibon who wrote that “[t]he value proposition for Qubole is clear [and] the public cloud is indeed an attractive environment for data scientists and analysts looking to perform large-scale exploratory analytics”. Their positive analysis of Qubole’s Big Data as a Service (BDaaS) offering reflects our own perspective on the tremendous potential that public cloud Big Data service offerings have. However, they both mentioned SLA (Service Level Agreement) as a challenge for the public cloud and consequently for a service like Qubole. Let’s shed some light on the issue.

Typical customer expectations of Big Data services revolve around availability, performance and durability. Customer expectations in these areas should not be defined by big companies like Yahoo, Facebook and Google but by what evolving startups and midsize businesses are capable of achieving by themselves. Their choices are usually between using a cloud service or diving into Hadoop deployments themselves. Let’s reflect on these options.

Private Big Data is risky

 

Availability is expensive and difficult to secure for small to mid-sized deployments. A truly available system needs to be geographically distributed with some basic redundancy. This involves cost and know-how that many smaller companies can’t afford.

Performance can be expensive to provision. You need to have sufficient compute and storage capacity for peak times plus buffer and growth potential. This requires significant capital investments and risks. All sorts of factors can result in spikes and bursts that need to be accounted for, and is notoriously hard to predict computation demands as businesses grow. Over-provisioning is wasteful and under-provisioning can disappoint customers and lost business.

Big Data durability is also a major challenge since it’s expensive to store and protect data with a kind of redundancy that’s needed to ensure its availability over time.

BDaaS is scalable, cost-effective, and reliable

 

Availability of a BDaaS depends on the underlying cloud service. Qubole, for example, uses Amazon Web Services and Google Compute Engine, which are geographically distributed to account for the unlikely loss of a complete geographic region due to something like a natural disaster. Qubole customers can take advantage of this design. It only takes a few clicks to return to operations with a new cluster in a new region. This makes Qubole services cost-effective, efficient and available. What’s more, our engineers have who worked on the biggest and most reliable Hadoop clusters in the world.

Public cloud services are sometimes criticized for being subject to performance variations caused by virtualization and “noisy neighbors.” Some people also consider them expensive relative to owning the hardware.

In reality, the costs of over-provisioning to guarantee availability and performance are usually much higher in an on-premise environment. In contrast, you pay only for what is needed in the cloud. Noisy neighbors are only a minor problem for Hadoop cloud clusters because of their distributed nature and use of large virtual machine types.

Durability is unbeatable in public clouds. Amazon’s S3 distributed file system durability and low-cost long-term storage with Glacier are unmatched by anything private companies can achieve. Even if you do lose data, there is a backup strategy in place. S3 also provides virtually unlimited storage that is horizontally scalable and tightly integrated with Hadoop for persistence and exchange of data for ETL processing.

BDaaS minimizes friction and need for support

 

BDaaS companies also have a vested interest in providing excellent support to their customers. That’s because the service only generates income if their customers continue using it. It’s in the vendor’s best interest to make their services easy to deploy and use. For Qubole customers,  the difference between managing a 10-, 100-, or 1,000-cluster deployment is little more than a change of one variable in the management screen. A customer can execute a 100-hour job on 10 machines or a 10-hour job on 100 machines without additional cost or complexity.

Supporting your own Big Data environment means either relying completely on your own resources or buying support from Hadoop distribution companies at a cost of several thousand dollars per cluster node per year.

In short, public cloud services and BDaaS demonstrate acceptable SLA attributes for most projects with the added benefit of scalability and resilience.They are a desirable alternative to rolling your own Big Data projects and even using paid-for Hadoop distributions.

About the Author

Gil AlloucheGil Allouche, Vice President of Marketing, Qubole

Allouche is a former software engineer who began his marketing career as a product strategist at SAP while earning his MBA from Babson College.

feature image source

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU