UPDATED 15:17 EDT / MARCH 09 2023

BIG DATA

Astronomer reveals how it intertwines data orchestration and MLOps for faster decisions

Access to real-time data is no longer a nice-to-have for organizations; it’s an imperative. And doing so effectively depends on a reliable, scalable and easy way to develop and run data workflows.

Since Apache Airflow has emerged as the de facto standard to orchestrate data pipelines, Astronomer Inc. uses it to remove friction when operationalizing machine learning and data workflows, according to Steven Hillion (pictured, left), chief data officer of Astronomer.

“We went from at the beginning of last year, about 500 data tasks that we were running on a daily basis to about 15,000 every day,” he said. “We run something like a million data operations every month within my team … the ability to spin up new production workflows essentially in a single day, you go from an idea in the morning to a new dashboard or a new model in the afternoon. That’s really the business outcome.”

Hillion and Jeff Fletcher (right), director of field engineering and machine learning at Astronomer, spoke with theCUBE industry analyst Lisa Martin at the AWS Startup Showcase: “Top Startups Building Generative AI on AWS” event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed how Astronomer uses Apache Airflow to enhance data orchestration and MLOps. (* Disclosure below.)

Joining the data orchestration and MLOps dots

Using data pipelines, Astronomer extends data orchestration capabilities to machine learning operations, according to Fletcher, who said that Apache Airflow comes in handy.

“I come from a machine learning background, and for me the interesting part is that machine learning requires the expansion into orchestration,” he stated. “A lot of the same things that you’re using to go and develop and build pipelines in a standard data orchestration space applies equally well in a machine learning orchestration space … my focus at Astronomer is really to explain how Airflow can be used well in a machine learning context.”

Having a data orchestration strategy enables businesses to schedule and manage data pipelines. As a result, seamless flow of information throughout an organization becomes inevitable, Hillion pointed out.

“Airflow was created by Airbnb some years ago to manage all of their data pipelines and manage all of their workflows, and now it powers the data ecosystem for organizations as diverse as Electronic Arts,” he said. “Conde Nast is one of our big customers, a big user of Airflow … the biggest banks on Wall Street use Airflow and Astronomer to power the flow of data throughout their organizations.”

A managed service for Airflow is being necessitated by the urge to standardize data pipelines, according to Hillion, who said that this is prompted by increased usage.

“If you look historically at the way that Airflow has been used, it’s often from the ground up,” he pointed out. “But then, increasingly, as you turn from pure workflow management and job scheduling to the larger topic of orchestration, you realize it gets pretty complicated, you want to have coordination across teams, and you want to have standardization for the way that you manage your data pipelines.”

Astronomer’s flexible business model

By having an adaptable business model, Astronomer propels Apache Airflow’s optimality, Fletcher explained. As a result, the managed Airflow offers enhanced capabilities like OpenLineage services and a cloud developer environment.

“We have a managed cloud service, and we have two modes of operation,” he noted. “One, you can bring your own cloud infrastructure or alternatively we can host everything for you. So it becomes a full SaaS offering. And from there Airflow does what Airflow does, which is its ability to then reach to different data systems and data platforms and to then run the orchestration.”

Some of Astronomer’s key differentiators include being able to host different cloud providers and innovations like OpenLineage that enables the end-to-end traceability of every data set, according to Fletcher.

“A lot of it is that we are not specific to one cloud provider,” he stated. “We have the ability to operate across all of the big cloud providers. One thing we’ve done is to augment core Airflow with Lineage services, so using the OpenLineage framework, another open-source framework for tracking datasets as they move from one workflow to another one.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the AWS Startup Showcase: “Top Startups Building Generative AI on AWS” event:

 (* Disclosure: Astronomer Inc. sponsored this segment of theCUBE. Neither Astronomer nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU