

Last week, theCUBE, SiliconANGLE Media’s mobile livestreaming studio, held its premier yearly event, BigData NYC, in conjunction with the Strata Data Conference. Our analysts dug out the very latest big data developments in conversations with innovators, practitioners and vendors. Simultaneously, we hit the ground in Washington, D.C. for Splunk .conf2017.
Altogether, it was a bustling week for big data. The overarching theme might be summed up in the words of John Furrier (@furrier), co-host of theCUBE: “You have to go out and do something, but you can’t do it alone,” he said, referring to the numerous partnerships cropping up in big data solutions and services.
In a sense, it could also describe the entire big data realm. “Internet of things” devices and cloud depend on each other for data analytics and artificial intelligence at the edge. And in data DevOps, the synergy of disparate parts in constructing a pipeline is imperative. For any one link in the big data ecosystem to expand, it will need a little help from its friends.
It is no secret that big data is choking on the complexity of its sundry piece parts. Gartner Inc. researchers have predicted that 60 percent of big data projects attempted in 2017 will end in failure. This is certainly not for lack of available software tools. Companies in a rush to turn a profit from their data might go out and buy a whole armful of software tools that provide some small piece of the puzzle.
“But a successful advanced analytics strategy is about more than simply acquiring the right tools. It’s also important to change mindsets and culture and to be creative in search of success,” said Lisa Kart, research director at Gartner. Unfortunately, this cultural change often calls for silo-breaking, while many big data software tools are not cut out for full-spectrum, cross-department use.
Watch the following video where theCUBE analysts discuss insights from the BigData NYC event:
The market is apparently fed up with fragmented tools and is craving a cover-all solution. “Many of the companies that are in the big data space today that are the most successful are companies that are positioning themselves as a service,” said Peter Burris (@plburris), co-host of theCUBE and head of research at Wikibon.
Admittedly, it is not easy to pull off data-as-a-service. It involves a large number of factors, and many people active in the data pipeline have different skills and play different roles. “There’s multiple personas in a company now,” Furrier said. “You have an analytics person, a chief data officer. You might have an IT person; you might have a cloud person.”
Any big data as-a-service offering will likely have to weave multiple technologies together and put a simple user interface on top. Even simplifying one key big data process might require collaboration from a number of companies. This is where partnerships can be priceless. At the Strata Data Conference during Big Data Week, graphics processing unit manufacturer Nvidia Corp. announced the GPU Data Frame, or GDF. Developed in partnership with MapD, H2O.ai and Anaconda (formerly Continuum), GDF is an application programming interface enabling interchange of data between processes running on the GPU.
Data scientists often spend long hours on feature engineering — labeling the features of data to make machine learning algorithms work. GDF dramatically speeds up the creation and training of machine learning models by making this process much more efficient, according to Nvidia.
The difficulty of big data is impelling lots other companies to put their heads together for mutual benefit, according to James Kobielus (@jameskobielus), co-host of theCUBE and Wikibon analyst. “The impact for developers is that there’s convergence among companies that might have competed to the death in particular hot new areas,” he said.
Developers are all too happy to see companies simplify big data and machine learning. The open-source community is increasingly pushing for big data technology that abstracts the data lake muck and cuts to the chase of application-building. Google’s TensorFlow, as well Caffe and Theano, are open-source deep learning technologies that are hot with developers and data scientists.
Outside of open-source, “Splunk could become a big data development platform,” said Dave Vellante (@dvellante), co-host of theCUBE. Splunk Inc. provides software that searches, monitors and analyzes machine-generated big data. The company has typically undersold its platform to developers, according to Vellante.
“It’s Hadoop-like, and it’s a big data pipeline, but it’s integrated. And it’s a lot simpler,” he said.
The Apache Hadoop software library has been synonymous with big data for years, but it may be losing currency with developers seeking instant gratification. “I think 2017 will be the year people start walking away from Hadoop,” Sameet Agarwal, vice president of engineering for Snowflake Computing, wrote on Networkworld.com last February. “The projects are too focused on making the technology work. Also, adequate skillsets are hard to find, and too much time and effort is required to build the infrastructure. Lastly, the initial investment is too high, and the turnaround time too long, making it very difficult to quickly experiment and iterate for success,” Agarwal wrote.
If Splunk does not expand its total addressable market by knocking Hadoop from its perch, it may do so with its new internet of things edge mission. While the company’s edge strategy is still under construction, it is earnest about bringing it to market eventually, “because in a world of industrial assets and of consumer devices, you’re producing so many more devices,” said George Gilbert (@ggilbert41), co-host of theCUBE. “You cannot, for latency and bandwidth reasons, send that all to the cloud to get an answer and then send it all back,” Gilbert said.
Watch the following video where theCUBE analysts discuss updates from the Splunk .conf event:
AI in internet of things devices was the subject of a lively presentation and panel during BigData NYC. Here, again, multiple technologies must interconnect.
“Machine learning algorithm development is actually slow and painful. So you really want people who know how to do this working with gobs of data, creating models and testing them offline,” said Neil Raden (@NeilRaden), contributing research analyst at Wikibon. This process must take place in the expansive, highly scalable cloud environment, he said.
Watch the complete Wikibon presentation and panel video below:
Internet of things edge devices themselves can shoulder some of the burden of big data analytics with improved processing chips.
“Inference is what’s going on inside the chips at the edge device,” Kobielus said. New central processing units, GPUs and other advanced chips are “playing in various combinations that are automating more and more very complex inferences scenarios at the edge,” he explained.
Ultimately, big data analytics, artificial intelligence and machine learning must be able to blend into handy, human-friendly products or applications for end-users or customers.
“The most successful companies that are working with AI are actually incorporating it into solutions. So the best AI solutions are actually the products that you don’t know there’s AI underneath,” said Stephanie McReynolds, vice president of marketing at Alation Inc.
Watch more of SiliconANGLE’s and theCUBE’s coverage of BigData NYC 2017 and Splunk .conf2017.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.