UPDATED 02:03 EDT / JULY 27 2017

BIG DATA

Learning to do distributed big data

In between meeting with customers, crowdchatting with our communities and hosting theCUBE, the research team at Wikibon, owned by the same company as SiliconANGLE, finds time to meet and discuss trends and topics regarding digital business transformation and technology markets. We look at things from the standpoints of business, the Internet of Things, big data, application, cloud and infrastructure modernization. We use the results of our research meetings to explore new research topics, further current research projects and share insights. This is the fourth summary of findings from these regular meetings, which we plan to publish every week.

For the last 10 years, businesses have tried to create value out of big data and advanced analytics based on a data lake approach, which is a centralized store of enterprise data that data scientists and others could “play” with. Sometimes this approach successfully generated big data returns. Too often it failed.

Nonetheless, the data lake concept was a reasonable approach because it economized on a variety of scarce assets, namely expertise, organizational attention, bespoke combination of analytic tooling and specialized big data systems. By accreting big data knowledge to a single, central location, a business could learn faster, make fewer mistakes and undertake advanced and complex analytics in a coherent way.

Today, enterprises with superior big data knowledge have developed a range of big data systems functions for serving a variety of analytic application domains, including fraud, security and recommendation engines. Although these systems are generating significant returns per se, leading enterprises are seeking to amplify big data value by directly integrating the systems into a range of operational, edge and mobile applications. That means more big data work is going to be performed closer to the events it supports, in terms of both place and time (i.e., not batch).

Here’s a resulting challenge, and it’s a big one: As we distribute the applications and work streams dependent on big data, we no longer can presume that a centralized data lake can feed all a business’s big data needs. Big data needs to adopt a distributed architecture to evolve. But what will that architecture look like?

“Enterprises are going to have to distribute big data workloads across an expanding range of other applications. That requires an architected response.” George Gilbert, Wikibon

Our research shows that distributed big data architectures will feature three core attributes:

Distributed big data will be based on hybrid cloud. Wikibon’s practitioner communities are clear: The cloud has a big role, but not the only role, to play in emerging big data architectures. Increasingly, intelligence will be pushed to the edge and mobile systems, and that will ensure that true private cloud will be a crucial element of evolving big data systems.

“At various shows we’ve attending, we’ve seen that a lot of the practitioners building data lakes are now partnering heavily with the likes of Amazon and Google. So, expect lots more of this to live in that multicloud world.” Stu Miniman, Wikibon analyst

Three classes of core big data services will be the basis for the new architectures. At the core of distributed big data architectures will be: 1) microservices and streaming; 2) distributed file system and DBMS; and 3) distributed machine learning. These services will be the basis for handling data, moving data, ingesting data, embedding data and pipelining data, among other things.

These services will be organized into implementation patterns tied to administration. Although the models and engines of big data may be distributed for edge, mobile and other types of distributed applications, modeling processes will tend to be centralized. However, that doesn’t mean a single group of data scientists that lord over all modeling. On the contrary, as expertise expands and tooling improves, a common set of administrative processes for modeling, training and testing will be established and commonly implemented across “centralized” administrative groups that, in fact, are likely to be aligned by business, application and technology domain expertise.

“If you look at any company and you look at its stable of operational systems, most systems aren’t just sitting there waiting for you to integrate big data analytics right into them to get better insight and better operations. You’re going to have to do surgery, or throw them out and start over again. It really is a big problem.” Neil Raden, Wikibon analyst

Action Item. Big data and analytics applications are going to be distributed. The processes of learning how best to distribute big data are going to take as much as five years to mature, but a common architecture is being conceived, common practices to enact that architecture are starting to diffuse, and tooling to increase the productivity of those practices is starting to hit big data communities. However, this is one area where no business wants to be left behind. Learning to do distributed big data will require strong CIO leadership and clear architectural commitments.

Image: Nikin/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU