No easy answers to Hadoop’s complexity, says Wikibon analyst
The bad news for companies trying to harness Big Data is that Hadoop is complex, and that complexity is not going away (see figure above). Wikibon research finds that it typically takes two years to move a Big Data project from proof-of-concept to the first production application, writes Wikibon Big Data and Analytics Analyst George Gilbert in “The Manageability Challenge Facing Hadoop”. The problem is that Hadoop is an ecosystem of 12 or more services rather than a single product, and tuning and management is fragmented among them. Even the most sophisticated users struggle.
The good news is that alternatives exist, and Gilbert looks at these in his latest alert. These include building on a managed services platform, using Spark instead of Hadoop and employing integrated Big Data tool sets from providers such as Amazon Web Services (AWS), Microsoft Azure or Google’s Google Compute Engine.
The problem with managed services is most aren’t designed specifically for Big Data, and companies have to get their data onto the service. AWS, for instance, has strong tools for migrating data but its Elastic MapReduce is hampered by its separation of compute from storage. Specialized Hadoop-as-a-Service providers like Altscale, Inc., on the other hand, are optimized for Big Data performance but lack the data onboarding options.
Spark greatly decreases the complexity of writing Big Data applications and runs in near-real-time rather than in batch. However, it is only a partial stack, Gilbert notes. The full stack requires at least four services, including a database such as Cassandra, a data ingestion service such as Kafka, and an orchestration service such as Zookeeper. Each of these requires three servers, Gilbert writes, and the stack does not yet have a unified management solution.
The easiest option, and the one with the fastest time-to-value, is provided by the increasingly unified sets of Big Data tools offered by large cloud service providers. AWS, for instance, provides a set of integrated higher-level tools including Kinesis Firehose, DynamoDB, Redshift SQL MPP and Data Pipeline for orchestration. The trade-off is a lack of openness and flexibility. These integrated stacks tend to break when a third-party service is introduced, and migrating from one to another is complex. Therefore, companies should investigate the options and choose carefully to get the service that best meets their needs.
Image Source Hortonworks, Wikibon 2015
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU