Kinetica rolls machine learning libraries into its GPU-powered database

Kinetica Reveal screen shot

In-memory database maker Kinetica Db Inc. today is adding in-database analytics via user-defined functions to its namesake product, enabling machine learning and artificial intelligence libraries such as TensorFlow, BiDMach, Caffe and Torch to run directly inside the database, rather than requiring a time-consuming process for extracting, transferring and loading the data.

The company, whose distinction is its use of hardware graphics processing units to accelerate performance, is also introducing a new visualization framework and adding a feature that lets users assign tasks to specific GPUs to boost performance up to tenfold.

The company, which has raised $13 million from a venture fund headed by former Oracle Corp. President Ray Lane, aims to address some of the scalability problems of key value stores such as HBase and Cassandra using GPUs to accelerate parallel processing in a column-store engine. “The era of the Internet of Things is starting,” said Eric Mizell, vice president of global solution engineering. “We solve the real-time gap that Hadoop left behind.”

Originally developed for applications such as visualization and gaming, GPUs are sparking new interest as parallel processing engines for highly scalable database and analytics tasks. The GPU typically attaches via a Peripheral Component Interconnect bus and has its own memory, enabling high levels of parallelization and offloading processing from the central processing unit.

Registering machine learning libraries as user-defined functions, or UDFs, enables them to be accessed via an application program interface call from within the database, greatly improving performance, Mizell said. This is especially useful for applications like trading, in which split-second decisions can have big financial consequences. “Traders can now bring in social and news feeds to look at trends and make better and faster decisions,” Mizell said.

Developers can use application programming interfaces to leverage third-party code such as Nvidia Corp.’s CUDA parallel computing platform by deploying orchestration hooks via RESTful APIs to for registration and de-registration. They can also call endpoints via the exposed API to create input and output tables and use arbitrary binaries to receive table data, do arbitrary computations and save output to a global table in a distributed manner.

There are many streaming and near-real-time engines out there, including Apache Spark, Apache Storm and Apache Flink. Kinetica doesn’t compete with those engines but works with them. “Those are execution engines; we’re a database that lets you run real-time calculations,” Mizell said.

The new visualization engine, called “Reveal,” (pictured) can be used for data exploration by enabling business analysts to visualize and interact with billions of data elements instantly. User can drag and drop data tables to create custom views and access more than a dozen analytical widgets for creating interactive real-time dashboards.

Reveal also integrates with major mapping engines, including those from Google, Environmental Systems Research Institute Inc., Mapbox Inc. and Microsoft Bing, to conduct interactive location-based analytics on massive datasets. The engine can render a billion points on a map in 200 milliseconds, or a fifth of a second, Mizell said.

Kinetica is also adding a VRAM Boost Mode that enables users to “pin” data to video memory instead of main memory, improving both performance and scalability.

The new features are immediately available. Pricing is based upon memory capacity, but Kinetica wouldn’t specify details.