UPDATED 12:51 EST / APRIL 02 2018

BIG DATA

Slay the big data ‘Swamp Thing’ with these governance protips

Now that many companies find themselves with expansive data lakes in this era of big data, what should they do to keep these information reservoirs from coagulating into sticky swamps? Scratch that — what if the ship has sailed, and they’re already up a messy, confusing data creek without a paddle? Without further ado (and without further belaboring the metaphor), here are protips from The ING Group on how to govern data lakes for compliance and analytics.

The Dutch multinational banking and financial services corporation headquartered in Amsterdam began building out its data lake and governance strategy about six years ago. It selected IBM Corp. to godfather it — the company supplied the data aggregation and labeling technologies. ING did not rake in gains from the project overnight; it took several years, and the company still has holes to patch, according to Ferd Scheepers (pictured), chief information architect at ING.

If you believe you can do this journey and have value after a year and then you’re done — it doesn’t work that way,” Scheepers said. That is not to say it isn’t worth the effort — ING has improved the efficiency of data governance and analytics across all departments. (At any rate, businesses can’t afford to slack off with  General Data Protection Regulation set to descend on them in May.) The recipe calls for a top-down executive decision and a clean and sober selection of appropriate technologies, according to Scheepers. 

Scheepers spoke with Dave Vellante (@dvellante) and Lisa Martin (@LuccaZara), co-hosts of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, at the IBM Think event in Las Vegas. They discussed how to govern big data and turn compliance fright to innovative might. (* Disclosure below.)

This week, theCUBE spotlights Ferd Scheepers as our Guest of the Week.

The labeling game gets all on the same page

Getting everyone within the ING empire on board with the data governance architecture required tweaking the pitch for different regions, departments, etc.

“Selling the architecture actually means that you need to go to the different stakeholders with very different stories. So what’s in it for them?” Scheepers said. For example, chief information officers gain more navigable landscape with automation replacing a lot of manual drudgery; the increased control means all their risk items go down, he explained. The business side gets well-articulated context around data — and they actually get to own the data and say who gets access to it and what they can do with it. 

A crucial step to governing data for use across an organization is getting everyone on the same page semantically. In other words, the business needs to bring all data sources together and qualify them with business terms so that people can understand what they are. That sounds simple enough, but the reality for large, branched-out corporations like ING is a bit complicated. Infusing a common language across all lines of business and across all countries was tricky, Scheepers pointed out. Even a simple term like “customer” can be subject to different interpretations.

“I mean that sounds very natural for a bank to understand what a customer is,” he said. “But you might have very different definitions based on where you come from and which country.”

ING is increasingly experimenting with data discovery tools that automatically classify data and tie it back to business terms. It still relies heavily on manual labeling, however, due to the massive quantity and diversity of data at ING. “As a bank, you probably have thousands of things that you could describe on a business term level,” Scheepers stated.

A hierarchy of priority for depth of description is a sensible way to keep from biting off a mouthful of data that governance can’t chew. “When you talk about customer data, you want to know all the different details about … what is a salary? Does an account include accrued interest?” Scheepers said. On the other hand, log data, for instance, can slide with a less detailed description, he added.

An even keel steers her clear of the swamp

ING applies some type of description to all of the data in its lake. That is what makes the difference between a well-ordered data lake that is ready for analytics use and the dreaded data swamp, according to Scheepers. Many businesses wound up with those from blindly throwing bits and bytes into the Apache Hadoop big data framework and calling it a day. That approach probably contributed to the staggering failure rate of big-data projects — which is now at 85 percent, according to Gartner analyst Nick Heudecker.‏

“Organizations … need a plan to get to production. Most don’t plan and treat big data as technology retail therapy,” Heudecker said. 

It’s worth the upfront effort to keep analysts out of a tangle when they need to derive insights and drive business value. “If it’s well qualified, if it’s known, you know the quality of it; you know where it is. It actually makes it way easier to use a lot of innovative technologies to work with that data, because you don’t have that problem of trying to find where everything is,” Scheepers said.

ING is looking to improve its data architecture with technologies other than IBM’s. Various open-source tools interest the company greatly (it is also pushing the technology it built itself out to open-source communities). However, the everything-but-the-kitchen-sink plus-APIs IT method is definitely not what Scheepers recommends.

The only way to be in control of your entire data landscape is to limit yourself in the technologies that you use,” he concluded. 

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the IBM Think event. (* Disclosure: TheCUBE is a paid media partner for IBM Think. Neither IBM, the event sponsor, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.