APPS
APPS
APPS
Columnar, a startup founded by core Apache Arrow developers, launched today with $4 million to accelerate data connectivity using Arrow-based drivers.
Bessemer Venture Partners led the company’s seed funding round with participation from Breakers, K5 Tokyo Black, Next Play Ventures and Composed Ventures, alongside a number of angel investors.
True to its name, Columnar wants to shake up currently row-oriented data communication between databases and services by providing tools for column-oriented communication.
Column-oriented databases organize data by storing values in columns, grouping all values of a single column either on disk or in memory, rather than storing entire rows together. This greatly enhances efficiency for analytical workflows that require aggregating data across multiple rows but only a few columns. For instance, it allows for much faster analysis of how a specific data point evolves over time or across rows.
It can minimize the amount of data that needs to be read, while also simplifying storage. Since all data in a single column is the same type, it can be compressed more effectively, which can also improve retrieval speed.
Modern-day columnar databases include Amazon.com Inc.’s Redshift, Snowflake Inc. and Google LLC’s BigQuery. The problem faced by enterprises today is that current communication standards, such as Open Database Connectivity and Java Database Connectivity, still operate in row-oriented formats. That means that even if the source and the destination use column-oriented data, it needs to be transformed before and after it’s sent. Those extra steps slow down communication.
“The inefficiency of these connectivity protocols has just become a real problem today…especially for AI applications that are using structured data,” Ian Cook, co-founder and chief executive of Columnar, told SiliconANGLE in an interview. “If you have a fleet of [graphics processing units] doing inference on tabular data, you really don’t want those GPUs idle because the connectivity protocol can’t keep up.”
The advent of big data and analytics processes, especially combined with artificial intelligence, created a clear bottleneck when it comes to older connectivity libraries. When someone asks an AI chatbot or agent for an answer, it often must reach out to multiple data sources to gather information. Each leg of that process adds to how long it takes to retrieve and move data, and Columnar’s work speeds that up.
“That’s why we need Arrow — it keeps the data in columnar format all the way through so we can zero out the serialization, deserialization overheads,” Cook said.
Apache Arrow, an open-source project launched almost 10 years ago, offers fast columnar interchange among code written in different languages. Due to this capacity, it quickly became a fixture of low-level data infrastructure. Its Python library alone is on track to be downloaded more than 2.5 billion times this year. In 2022, the project gave rise to ADBC, or Arrow Database Connectivity, a set of application programming interfaces and libraries for Arrow-native access.
Today, alongside the launch, Columnar is offering specialized tools for developers to accelerate the adoption of ADBC. The first step is releasing “dbc,” an open-source command-line tool for installing and managing ADBC drivers, which implement a standard for high-performance database access using column-based data transfer.
“A lot of vendor products need fast and efficient connectivity, but that is an undifferentiated piece of heavy lifting that they would rather not do,” Cook said.
The company is launching with 10 ADBC drivers, four of which are new: drivers for Amazon Redshift, MySQL, Microsoft Corp.’s SQL Server and Trino. Of course, Columnar is not stopping there and intends to introduce more drivers in cooperation with other vendors and open-source projects.
To achieve this, Columnar has launched the Driver Foundry, a collaboration with Databricks Inc., dbt Labs Inc., Microsoft, Snowflake and the Apache Arrow developer community.
Cook explained that the strategic imperative for Columnar is to be a layer, something that provides the tooling needed to get the job done, and not a stack. He said that he wants developers and businesses to easily implement better data source connectivity and not worry about it defining how they operate.
“We’re trying to create broadly usable tech that can make its way into all the tools you’re using and accelerate them all,” Cook said.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.