UPDATED 11:26 EDT / NOVEMBER 17 2017

BIG DATA

Seldom-used metadata could be next gold mine, says NetApp CTO

In 2006, an online retailer came up with an idea to let users rent unused capacity on the company’s virtual computers. Eleven years later, Amazon Web Services Inc. is generating $16 billion in annual revenue.

In 2008, three starving students thought that the notion of renting out unused space in their apartment might form the basis for a mildly successful business. Nine years later, Airbnb Inc. now generates nearly $3 billion in profits from rentals around the world.

Not every underused property has this kind of value, but there is an idle byproduct of digital exhaust that some information technology industry observers are beginning to scrutinize more closely than ever before: metadata.

It’s data that describes data, information created alongside every file that tells the user when it was last accessed, who opened it, when it was modified, and a host of other details. It’s information rich, eminently searchable, and potentially very valuable.

“Metadata becomes almost more important than the data in many cases. We can anticipate architectures where the data drives the processing,” said Mark Bregman (pictured), senior vice president and chief technology officer of NetApp Inc.

Bregman stopped by theCUBE, SiliconANGLE’s mobile livestreaming studio, and spoke with co-hosts Rebecca Knight (@knightrm) and Peter Burris (@plburris) during the NetApp Insight event in Berlin, Germany. They discussed potential uses for metadata, similarities between new cloud models and the ride-sharing revolution, new tools for processing data at the edge, a future role for blockchain, and how a recent NetApp acquisition highlights the need for in-memory computing. (* Disclosure below.)

This week theCUBE features Mark Bregman as our Guest of the Week.

Metadata can become self-aware

At the core of metadata’s future value is its ability to become self-aware, capable of being infused with enough intelligence to know which processes it must execute at any given time — or which ones it should not. In this scenario, data contains rules about who can see it, such as in the not-too-distant future when autonomous cars are plying the roads in greater numbers and accidents happen.

With literally hundreds of sensors and cameras capturing and storing every second of the autonomous car in motion, driverless vehicles will have a robust trove of available data in the event of a mishap on the road. “The insurance company needs to know who the car owner is, but maybe they don’t need to know something else, like where I came from,” Bregman explained. “The authorities might need both.”

The IT industry has been built on a model where administrators who needed a particular service went out and bought a server, installed the machine, and owned it. But the advent of enterprise cloud computing has flipped the script. Much as rental car agencies eliminated the need to buy a car every time one traveled to Europe, the public cloud has curtailed the need to purchase in-house computing capacity.

Cloud as a rideshare model

The rise of serverless computing has underscored this point, a trend that Bregman finds analogous to the ride-sharing revolution. Similar to dedicated Uber riders who don’t want the expense of owning a car, IT departments can procure cloud computing services as needed and pay only for the time used.

“That’s very similar to the way the cloud works today,” Bregman said. “I pick what instances I want, and they meet my needs.”

Whether stored in the public cloud or on-premises, data is growing so rapidly that it is reaching a point where the laws of physics prevent movement between platforms. This has created alarm in some quarters as enterprises watch datasets expand into exabyte and zettabyte ranges. Bregman’s reaction? “That’s OK,” the NetApp executive said.

The saving grace in the face of this information tsunami will be processing data at the edge. The autonomous car again provides a useful example for how this will work. A certain amount of critical summary data will be sent back to the host network, but a majority of it will remain stored in the vehicle.

The continued expansion of connected internet of things devices will likely fuel a similar solution. “There’s going to be more data generated and stored at the edge,” Bregman said. “All of that will not be able to be shipped back to the core. It means that we need to figure out different ways to do overall data lifecycle management all the way from the edge.”

Blockchain could play key role

This need for lifecycle management raises the prospect that another area of the technology world could soon benefit from the need to closely track data movement. The kind of process needed to create a clearly defined data chain of custody is tailor made for the blockchain, which uses a distributed, decentralized digital ledger.

There are already a number of businesses, such as several in the trucking industry, which have embraced the blockchain. Wikibon Inc. analysts recently predicted that this area will play a key role in how networks are created moving forward. “It will be a distributed and immutable ledger that will give us new ways to access and understand our data,” Bregman said.

The evolution from big data to huge data at the edge will put pressure on computer systems to analyze information in near real time. This means that storage systems, previously viewed as just a holding pen for data, will need to play a greater role in analytics and database processing.

NetApp’s low-key acquisition of startup Plexistor Ltd six months ago is revealing by itself in why this will become increasingly more important. Plexistor gives NetApp an expertise in ultra-low latency persistent memory, which points toward a need for stronger in-memory analytics.

Further confirmation of where this technology may be headed can be found in last week’s news from Hewlett Packard Enterprise Co. of an upgrade for its in-memory computing platform, the first such move in over three years. “There will be very large amounts of data being analyzed in near real time to meet needs for business,” Bregman said.

The potential for data to become more “self-aware” through leveraging the power of metadata is real, but it’s not going to happen overnight. “This is not a near-term prediction; this is not one for next year,” Bregman concluded. “It requires rethinking how we think about data and processing.”

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of NetApp Insight Berlin. (* Disclosure: TheCUBE is a paid media partner for the NetApp Insight Berlin event. Neither NetApp Inc., the event sponsor, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU