Public clouds befuddled by SLAs? Nonsense!
Bert Latamore wrote recently on SiliconANGLE about Qubole’s Big Data as a Service (BDaaS) offering and echoed some points made by Jeff Kelly on Wikibon who wrote that “[t]he value proposition for Qubole is clear [and] the public cloud is indeed an attractive environment for data scientists and analysts looking to perform large-scale exploratory analytics”. Their positive analysis of Qubole’s Big Data as a Service (BDaaS) offering reflects our own perspective on the tremendous potential that public cloud Big Data service offerings have. However, they both mentioned SLA (Service Level Agreement) as a challenge for the public cloud and consequently for a service like Qubole. Let’s shed some light on the issue.
Typical customer expectations of Big Data services revolve around availability, performance and durability. Customer expectations in these areas should not be defined by big companies like Yahoo, Facebook and Google but by what evolving startups and midsize businesses are capable of achieving by themselves. Their choices are usually between using a cloud service or diving into Hadoop deployments themselves. Let’s reflect on these options.
Private Big Data is risky
Availability is expensive and difficult to secure for small to mid-sized deployments. A truly available system needs to be geographically distributed with some basic redundancy. This involves cost and know-how that many smaller companies can’t afford.
Performance can be expensive to provision. You need to have sufficient compute and storage capacity for peak times plus buffer and growth potential. This requires significant capital investments and risks. All sorts of factors can result in spikes and bursts that need to be accounted for, and is notoriously hard to predict computation demands as businesses grow. Over-provisioning is wasteful and under-provisioning can disappoint customers and lost business.
Big Data durability is also a major challenge since it’s expensive to store and protect data with a kind of redundancy that’s needed to ensure its availability over time.
BDaaS is scalable, cost-effective, and reliable
Availability of a BDaaS depends on the underlying cloud service. Qubole, for example, uses Amazon Web Services and Google Compute Engine, which are geographically distributed to account for the unlikely loss of a complete geographic region due to something like a natural disaster. Qubole customers can take advantage of this design. It only takes a few clicks to return to operations with a new cluster in a new region. This makes Qubole services cost-effective, efficient and available. What’s more, our engineers have who worked on the biggest and most reliable Hadoop clusters in the world.
Public cloud services are sometimes criticized for being subject to performance variations caused by virtualization and “noisy neighbors.” Some people also consider them expensive relative to owning the hardware.
In reality, the costs of over-provisioning to guarantee availability and performance are usually much higher in an on-premise environment. In contrast, you pay only for what is needed in the cloud. Noisy neighbors are only a minor problem for Hadoop cloud clusters because of their distributed nature and use of large virtual machine types.
Durability is unbeatable in public clouds. Amazon’s S3 distributed file system durability and low-cost long-term storage with Glacier are unmatched by anything private companies can achieve. Even if you do lose data, there is a backup strategy in place. S3 also provides virtually unlimited storage that is horizontally scalable and tightly integrated with Hadoop for persistence and exchange of data for ETL processing.
BDaaS minimizes friction and need for support
BDaaS companies also have a vested interest in providing excellent support to their customers. That’s because the service only generates income if their customers continue using it. It’s in the vendor’s best interest to make their services easy to deploy and use. For Qubole customers, the difference between managing a 10-, 100-, or 1,000-cluster deployment is little more than a change of one variable in the management screen. A customer can execute a 100-hour job on 10 machines or a 10-hour job on 100 machines without additional cost or complexity.
Supporting your own Big Data environment means either relying completely on your own resources or buying support from Hadoop distribution companies at a cost of several thousand dollars per cluster node per year.
In short, public cloud services and BDaaS demonstrate acceptable SLA attributes for most projects with the added benefit of scalability and resilience.They are a desirable alternative to rolling your own Big Data projects and even using paid-for Hadoop distributions.
About the Author
Gil Allouche, Vice President of Marketing, Qubole
Allouche is a former software engineer who began his marketing career as a product strategist at SAP while earning his MBA from Babson College.
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU