Ready-made apps, operations mindset enable faster data science applications
Simply developing accurate data science models presents such a big effort that many companies overlook the challenges associated with bringing these models to production. To help facilitate this process, DataTorrent Inc. is using its open-source engine Apache Apex to help businesses better utilize real-time big data analytics. And the company’s co-founder and co-founder and chief strategy officer, Phu Hoang (pictured), is leveraging his years of engineering experience from Yahoo’s early days of bringing complex infrastructure stacks to a production-worthy state.
“Very quickly we learned that at the pace of scale of data that we were generating that we couldn’t use [current enterprise] software, and we were kind of on our own,” Hoang said. “So we had to invent approaches to do that. The thing we knew a lot was commodity servers on racks. So, we ended up saying, ‘How do I solve this big data processing problem using that hardware?’ … We started to iterate around how to do distributed processing across many hundreds of servers.”
Hoang spoke with John Furrier (@furrier), host of theCUBE, SiliconANGLE media’s mobile livestreaming studio, at theCUBE’s studios in Palo Alto, California. They discussed the mindset and strategy required for quickly bringing data science applications to production.
From dev to prod
DataTorrent applies the same operations-driven mentality from Hoang’s Yahoo days in helping companies bring big data applications to production. All of their engineers are trained to live and breathe optimization for stability and robust operation at scale.
“Our DNA is all about ops. We think that, especially with big data, there are lots of ways to do prototypes and get some proof of concept going. But getting that to production to run it 24×7 and never lose data, that really has been hard,” Hoang said.
A key to enabling a smooth productization experience with data science applications has been leveraging large building blocks that can address the majority of customer-driven use cases. These building blocks can come in the form of ready made apps that only require minor tweaking to fit the needs of a customer.
“As we continue to learn in working with our customers and starting to see the patterns … putting kind of a bigger functional block together so that it’s easier to build a big data application at this next layer — machine learning, rule engines, whatever. But how do you piece that together in a way that is 80 percent done so that the customer only has the last mile?” Hoang asked.
Watch the complete video interview below. (* Disclosure: DataTorrent Inc. sponsored this segment on SiliconANGLE Media’s theCUBE. Neither DataTorrent Inc. nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Photo: SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU