UPDATED 19:15 EST / NOVEMBER 21 2019

BIG DATA

Think app lift-and-shift was bad? Data migration can ruin machine learning

Lift and shift has not been the greatest friend to those migrating to cloud. Many hustled legacy applications to cloud to little, if any, positive effect; some wound up “repatriating” apps once the bill arrived. Likewise, valuable data from on-premises systems can’t be dumped cold into the cloud; after all, next-gen machine-learning applications in cloud stand or fall on well-governed, quality data.

Older systems hold operational-processing data that helps machine-learning algorithms — in cloud or on-prem — make predictions. However, there is a generational gap to bridge between old data and advanced analytics technologies.

“Making those applications and data available for next-generation, next-wave platforms is becoming a challenge for a couple of different reasons,” according to Tendü Yoğurtçu (pictured), chief technology officer of Syncsort Inc.

Yoğurtçu sat down with Peter Burris (@plburris), host of theCUBE, SiliconANGLE Media’s livestreaming studio, for a CUBEConversation at theCUBE’s studio in Boston, Massachusetts. They discussed the challenge of marrying fruitful legacy data with advanced new analytics systems (see the full discussion with transcript here). (* Disclosure below.)

Data goes from bad to worse in machine learning

There are two major pain points in the onboarding of on-prem operational data for use in advanced analytics, according to Yoğurtçu. One is simply accessing the data in a timely and efficient manner; the other is in preserving the policies, privacy and security measures connected to that data. 

We saw the pitfalls of wholesale data dumping with data lakes. Once companies needed to extract data for analytics, they had a quite a mess to wade through to acquire it. Today, “there’s more focus … on data quality from day one: How am I going to ensure that I’m delivering trusted data and populating the cloud data stores or delivering trusted data to microservices in the cloud?” Yoğurtçu asked. 

The core asset is always the data,  Yoğurtçu pointed out. “Make the data available instead of going after the applications. Make the data from these existing on-premise and different platforms available for cloud,” she said. 

The industry is seeing companies pilot projects involving cloud data warehousing to prep data for hybrid analytics, Yoğurtçu added.

As machine learning and artificial intelligence become key features of modern applications, we now understand that data governance and quality cannot be an afterthought.

“In machine learning, the effect of bad data is really multiplied because of the training of the model, as well as the insights,” Yoğurtçu concluded. 

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s CUBE Conversations(* Disclosure: Syncsort Inc. sponsored this segment of theCUBE. Neither Syncsort nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU