UPDATED 02:03 EDT / JULY 27 2017

BIG DATA

Learning to do distributed big data

In between meeting with customers, crowdchatting with our communities and hosting theCUBE, the research team at Wikibon, owned by the same company as SiliconANGLE, finds time to meet and discuss trends and topics regarding digital business transformation and technology markets. We look at things from the standpoints of business, the Internet of Things, big data, application, cloud and infrastructure modernization. We use the results of our research meetings to explore new research topics, further current research projects and share insights. This is the fourth summary of findings from these regular meetings, which we plan to publish every week.

For the last 10 years, businesses have tried to create value out of big data and advanced analytics based on a data lake approach, which is a centralized store of enterprise data that data scientists and others could “play” with. Sometimes this approach successfully generated big data returns. Too often it failed.

Nonetheless, the data lake concept was a reasonable approach because it economized on a variety of scarce assets, namely expertise, organizational attention, bespoke combination of analytic tooling and specialized big data systems. By accreting big data knowledge to a single, central location, a business could learn faster, make fewer mistakes and undertake advanced and complex analytics in a coherent way.

Today, enterprises with superior big data knowledge have developed a range of big data systems functions for serving a variety of analytic application domains, including fraud, security and recommendation engines. Although these systems are generating significant returns per se, leading enterprises are seeking to amplify big data value by directly integrating the systems into a range of operational, edge and mobile applications. That means more big data work is going to be performed closer to the events it supports, in terms of both place and time (i.e., not batch).

Here’s a resulting challenge, and it’s a big one: As we distribute the applications and work streams dependent on big data, we no longer can presume that a centralized data lake can feed all a business’s big data needs. Big data needs to adopt a distributed architecture to evolve. But what will that architecture look like?

“Enterprises are going to have to distribute big data workloads across an expanding range of other applications. That requires an architected response.” George Gilbert, Wikibon

Our research shows that distributed big data architectures will feature three core attributes:

Distributed big data will be based on hybrid cloud. Wikibon’s practitioner communities are clear: The cloud has a big role, but not the only role, to play in emerging big data architectures. Increasingly, intelligence will be pushed to the edge and mobile systems, and that will ensure that true private cloud will be a crucial element of evolving big data systems.

“At various shows we’ve attending, we’ve seen that a lot of the practitioners building data lakes are now partnering heavily with the likes of Amazon and Google. So, expect lots more of this to live in that multicloud world.” Stu Miniman, Wikibon analyst

Three classes of core big data services will be the basis for the new architectures. At the core of distributed big data architectures will be: 1) microservices and streaming; 2) distributed file system and DBMS; and 3) distributed machine learning. These services will be the basis for handling data, moving data, ingesting data, embedding data and pipelining data, among other things.

These services will be organized into implementation patterns tied to administration. Although the models and engines of big data may be distributed for edge, mobile and other types of distributed applications, modeling processes will tend to be centralized. However, that doesn’t mean a single group of data scientists that lord over all modeling. On the contrary, as expertise expands and tooling improves, a common set of administrative processes for modeling, training and testing will be established and commonly implemented across “centralized” administrative groups that, in fact, are likely to be aligned by business, application and technology domain expertise.

“If you look at any company and you look at its stable of operational systems, most systems aren’t just sitting there waiting for you to integrate big data analytics right into them to get better insight and better operations. You’re going to have to do surgery, or throw them out and start over again. It really is a big problem.” Neil Raden, Wikibon analyst

Action Item. Big data and analytics applications are going to be distributed. The processes of learning how best to distribute big data are going to take as much as five years to mature, but a common architecture is being conceived, common practices to enact that architecture are starting to diffuse, and tooling to increase the productivity of those practices is starting to hit big data communities. However, this is one area where no business wants to be left behind. Learning to do distributed big data will require strong CIO leadership and clear architectural commitments.

Image: Nikin/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Learning to do distributed big data

Image: Nikin/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Learning to do distributed big data

Image: Nikin/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Cookies