UPDATED 02:03 EDT / JULY 27 2017


Learning to do distributed big data

In between meeting with customers, crowdchatting with our communities and hosting theCUBE, the research team at Wikibon, owned by the same company as SiliconANGLE, finds time to meet and discuss trends and topics regarding digital business transformation and technology markets. We look at things from the standpoints of business, the Internet of Things, big data, application, cloud and infrastructure modernization. We use the results of our research meetings to explore new research topics, further current research projects and share insights. This is the fourth summary of findings from these regular meetings, which we plan to publish every week.

For the last 10 years, businesses have tried to create value out of big data and advanced analytics based on a data lake approach, which is a centralized store of enterprise data that data scientists and others could “play” with. Sometimes this approach successfully generated big data returns. Too often it failed.

Nonetheless, the data lake concept was a reasonable approach because it economized on a variety of scarce assets, namely expertise, organizational attention, bespoke combination of analytic tooling and specialized big data systems. By accreting big data knowledge to a single, central location, a business could learn faster, make fewer mistakes and undertake advanced and complex analytics in a coherent way.

Today, enterprises with superior big data knowledge have developed a range of big data systems functions for serving a variety of analytic application domains, including fraud, security and recommendation engines. Although these systems are generating significant returns per se, leading enterprises are seeking to amplify big data value by directly integrating the systems into a range of operational, edge and mobile applications. That means more big data work is going to be performed closer to the events it supports, in terms of both place and time (i.e., not batch).

Here’s a resulting challenge, and it’s a big one: As we distribute the applications and work streams dependent on big data, we no longer can presume that a centralized data lake can feed all a business’s big data needs. Big data needs to adopt a distributed architecture to evolve. But what will that architecture look like?

“Enterprises are going to have to distribute big data workloads across an expanding range of other applications. That requires an architected response.” George Gilbert, Wikibon

Our research shows that distributed big data architectures will feature three core attributes:

Distributed big data will be based on hybrid cloud. Wikibon’s practitioner communities are clear: The cloud has a big role, but not the only role, to play in emerging big data architectures. Increasingly, intelligence will be pushed to the edge and mobile systems, and that will ensure that true private cloud will be a crucial element of evolving big data systems.

“At various shows we’ve attending, we’ve seen that a lot of the practitioners building data lakes are now partnering heavily with the likes of Amazon and Google. So, expect lots more of this to live in that multicloud world.” Stu Miniman, Wikibon analyst

Three classes of core big data services will be the basis for the new architectures. At the core of distributed big data architectures will be: 1) microservices and streaming; 2) distributed file system and DBMS; and 3) distributed machine learning. These services will be the basis for handling data, moving data, ingesting data, embedding data and pipelining data, among other things.

These services will be organized into implementation patterns tied to administration. Although the models and engines of big data may be distributed for edge, mobile and other types of distributed applications, modeling processes will tend to be centralized. However, that doesn’t mean a single group of data scientists that lord over all modeling. On the contrary, as expertise expands and tooling improves, a common set of administrative processes for modeling, training and testing will be established and commonly implemented across “centralized” administrative groups that, in fact, are likely to be aligned by business, application and technology domain expertise.

“If you look at any company and you look at its stable of operational systems, most systems aren’t just sitting there waiting for you to integrate big data analytics right into them to get better insight and better operations. You’re going to have to do surgery, or throw them out and start over again. It really is a big problem.” Neil Raden, Wikibon analyst

Action Item. Big data and analytics applications are going to be distributed. The processes of learning how best to distribute big data are going to take as much as five years to mature, but a common architecture is being conceived, common practices to enact that architecture are starting to diffuse, and tooling to increase the productivity of those practices is starting to hit big data communities. However, this is one area where no business wants to be left behind. Learning to do distributed big data will require strong CIO leadership and clear architectural commitments.

Image: Nikin/Pixabay

Since you’re here …

Show your support for our mission with our one-click subscription to our YouTube channel (below). The more subscribers we have, the more YouTube will suggest relevant enterprise and emerging technology content to you. Thanks!

Support our mission:    >>>>>>  SUBSCRIBE NOW >>>>>>  to our YouTube channel.

… We’d also like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.

If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.