UPDATED 17:36 EDT / SEPTEMBER 21 2017

BIG DATA

Actian adds Apache Spark support to vector database on Hadoop

Actian Corp. has added support for Apache Spark to its hybrid data management, analytics and integration platform.

Actian Vector in Hadoop, known as VectorH, uses vector processing and multi-level in-memory acceleration to speed up performance of Hadoop data stores. The added support was announced Tuesday.

Vector processing is based on a topology that’s vectorized and columnar. Instead of handling just a single value, processors can perform the same instruction on multiple values. Vector processing uses a columnar format, which enables queries to scan columns rather than rows to reduce data transfer needs and more efficiently process columnar operations — such as MIN, MAX, SUM, COUNT and AVG. This delivers superior performance for certain tasks, such as matrix manipulation and numerical simulation.

The company will use Spark with VectorH to support a diverse range of file formats and workloads including machine learning and a combination of transactional and analytical workloads in the same environment.

Actian grew out of Ingres Corp.’s namesake relational database management system, which briefly challenged Oracle Corp. for supremacy in the late 1980s. At the time, Oracle supported the SQL query language while Ingres supported an alternative called Quel. When IBM chose to support SQL on its DB2 DBMS, Oracle’s fortunes took off while Ingres languished. After a series of acquisitions and different owners, including a stint as an open-source product, Ingres changed its name to Actian. Ingres is the 29th most popular relational DBMS, according to DB-Engines.

“Spark connectivity enables native support for native Hadoop columnar data stores Parquet and ORC and can deliver better query performance than Hive, Impala and Spark SQL,” said John Bard, senior director of marketing at Actian.

Capability enables Ingres users, of which there are still thousands, to add in-memory vector analytics to their transaction processing environments. “It’s like the advantages of a graphical processing unit but without special hardware,” said Jeff Veis, Actian’s chief marketing officer. He said Intel estimates that most uses tap into no more than 10 percent of their processing capacity. “We can get that to 90 to 95 percent,” he said.

Actian said its platform also provides real-time data refreshes with no performance penalty, resource management in the Hadoop cluster, query optimization and industry standard SQL support. Workloads of standard benchmark queries that typically take over two hours with traditional SQL queries on Hadoop finish in less than a minute on VectorH, the company claimed.

“People are getting impatient waiting to see analytics delivered at enterprise grade and performance,” Veis said. “They want to move to open source tools for analytics, but they’re seeing mediocre performance using Hive or Drill or Impala. We’re able to deliver hundreds of times the performance on scale-out queries.”

Spark support is available immediately. A single-node version of Vector Hadoop that runs on Windows and Linux is available free as a community edition here.

Image: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU