

Open-source data integration startup Airbyte Inc. has come up with an easy way for Amazon Web Services Inc. customers to replicate data from dozens of popular sources to their Amazon Simple Storage Service accounts.
The new capability announced today is said to be the industry’s first-ever open-source integration for a data lake. Companies can now use one of Airbyte’s more than 75 pre-built connectors to transfer data from a number of widely used databases to their Amazon S3 accounts.
Airbyte is a newly emerged data integration startup that’s trying to solve the tricky problem of moving data from applications, databases and application programming interfaces to data warehouses and data lakes, which are used to analyze that information. It rivals firms such as Fivetran Inc. and Talend SA, and does away with the need for companies to build their own data connector for each individual data source.
Airbyte’s open-source connectors automatically adjust to any schema and API changes, ensuring continuity with data projects, the company says. They run in Docker containers too, allowing them to be deployed on any kind of infrastructure.
Whereas rivals like Fivetran and Talend build their own connectors, which are proprietary software, Airbyte’s are all built and maintained by the open-source community. Airbyte Chief Executive Michel Tricot told SiliconANGLE in May, when the company raised $26 million in a Series A funding round, that the open-source approach has significant advantages, allowing connectors to be more easily edited, for example.
Moreover, by building an open-source community, Airbyte believes it will eventually be able to create and maintain many more connectors than its rivals, supporting numerous smaller services, Tricot said. The company’s vision is to foster a community that will eventually build and maintain thousands of connectors.
Airbyte already offers connectors from sources such as PostgreSQL, MySQL, Facebook Ads, Salesforce and Stripe. Users can connect those to data warehouses, including Redshift, Snowflake and BigQuery.
Airbyte said that Amazon S3 is the first data lake destination it offers. That will provide a big advantage to some users, because data lakes and data warehouses are not the same.
Whereas data warehouses are filled with structured data that has already been processed and filtered for a specific purpose, data lakes are vast pools of raw data that has no predefined purpose. Data lakes are therefore more difficult to work with, but the information within them potentially holds much greater value.
Asked why Airbyte built its first data lake connector for Amazon S3, Tricot told SiliconANGLE that it was the “most popular and also the most requested” by the company’s users. That’s not to say other data lakes are being neglected though, for the company has plans to add more destinations for its connectors in the near future. Its targets include “the data lakes of other cloud providers” and also the open-source Delta Lake project started by Databricks Inc., it said.
“Airbyte is moving forward with its mission to commoditize all data integration and will start supporting all the other data lakes,” Tricot said in prepared remarks today. Referring to processes known as extract/transform/load or extract/load/transform, he added, “Airbyte is becoming the new de facto standard for open-source ETL/ELT.”
THANK YOU