

Analytical database developer Kinetica DB Inc. today announced that it has integrated a native large language model-based query engine into its platform that lets users perform ad-hoc data analysis on real-time, structured data using natural language.
The company said it built its own LLM in part because of privacy and security concerns that have been raised about public models such as OpenAI LP’s ChatGPT, which Kinetica announced it would support four months ago.
“This is an add-on feature for folks who want the best accuracy and a full guarantee that no metadata leaves their perimeter,” said Nima Negahban, Kinetica’s co-founder and chief executive officer. Customers need to provide their own infrastructure to support the LLM, including servers with graphics processing units.
The native LLM is tailored to syntax and data definitions for vertical industries such as telecommunications, financial services, automotive and logistics as well as for Kinetica’s own query syntax. It also works with context objects, which are sets of instructions, tables and rules that add context to a query and that can be used to help the model understand unique processes and terms.
“You can pass a question and say you want it to run within a specific context,” Negahban said. “It can generate the SQL using that context as a parameter. What you get is a programmatic experience that understands all of your data and your data ecosystem and leverages the LLM to generate SQL for you.”
The company will continue to support OpenAI’s LLM as a lower-cost alternative and promised that more models will be supported in the future with Nvidia Corp.’s NeMo LLM framework targeted for later this year.
“There are challenges to using LLMs to run SQL queries,” said Philip Darringer, vice president of product management at Kinetica. “You often run into syntax errors and problems with making sure you’re getting the same results every time. Standard hallucinations that you see in other domains raise their heads as well in SQL such as generating new column and function names that might not be applicable in specific scenarios.”
Kinetica says its database can return answers to queries within seconds, even for complex and unknown questions that span time series, spatial, graph and machine learning models. The company says it can achieve that performance by using vectorization, which stores data in fixed-size blocks called vectors and runs queries on multiple vectors in parallel.
In addition to its internal database, the platform can access data in a variety of third-party data stores such as Snowflake Inc.’s Data Cloud, Google LLC’s Big Query and Apache Kafka streams. “Customers can leave the data in the source database and reference it as an external query or, if they want to get additional benefits of Kinetica, they can do a onetime synchronization using change data capture,” Darringer said.
The native LLM is immediately available in a containerized, secure environment for either on-premises or cloud deployment.
THANK YOU