Teradata sucks up IoT data, tightens Hadoop embrace

Oct. 19 embargo Teradata

Teradata Corp. kicked off its annual user conference this week with a series of announcements that move its big data platform into the emerging world of the Internet of Things (IoT) and tighten its embrace of the Hadoop platform.

The company’s first IoT foray is Teradata Listener, a software gateway that the company described as having “real-time listening capabilities to follow multiple streams of sensor and IOT data and propagate it into multiple platforms.” Based upon an assortment of open source tools that includes the Apache Kafka message broker, Apache Mesos cluster manager, ElasticSearch search engine, OpenStack cloud platform, Docker containers and microservices, Listener is designed to ingest large volumes of real-time data and dispatch it to back-end databases, file systems and analytics engines for processing.

“It’s frictionless; you can load analytics in real time, dump it into the database and then run analytics at scale,” said Chris Twogood, VP of products and services marketing at Teradata. Users can invoke data streams themselves using API keys, and track and analyze metadata about streaming information to know things like source, type and volume of data.

In conjunction with Listener, Teradata also said it will make its Aster Analytics suite available on Hadoop, aiming to simplify that Big Data platform, which has been criticized for complexity and constant change. Astra is an analytics platform that Teradata acquired in 2011 and which was previously delivered via a dedicated appliance. Aster will now run on Hadoop as well as Teradata hardware and in the Teradata cloud.

Aster is designed to analyze multi-structured data using an underlying SQL engine and a set of more than 100 prepackaged analytics techniques along with seven vertical industry applications. Of particular note is its ability to convert SQL statements to the appropriate MapReduce or graph engine routines.

Aster runs on top of the Apache Yarn resource manager, which in sits on top of HDFS, Twogood, said. “We store all data in HDFS. You can run the analytics against all the detail data without taking it out of the Hadoop cluster,” he said. “This permits you to run algorithms against Internet of Things data at scale. Listener ingests. Aster does analysis.”

Listener is well-suited to feed an Aster engine by gathering data from many real-time streams – such as sensor readings, Twitter data, real-time stock feeds and Spark analytics – and feeding it into multiple back ends. Listener doesn’t process or transform data but rather delivers it to back-end systems like Apache Spark, Tibco and IBM DB2 for processing. Spark can be used as effectively a pre-processor to perform top-level analysis before streaming data back into other data stores. A single stream can be sent to multiple repositories simultaneously.

Aster supports Hortonworks, Inc.’s HDP 2.3 and Cloudera, Inc. Enterprise 5.4 natively. MapR Technology, Inc.’s MapR Hadoop distribution is supported on commodity clusters only.

In keeping with its recent efforts to make Hadoop integration seamless, Teradata also introduced what is calling a Unified Data Architecture Appliance, consisting of Teradata, Aster and Hadoop installed in a single cabinet and $1,000/terabyte processing performance. Intended primarily for space-constrained businesses, the appliance is nevertheless distinct for its inclusion of Presto, an open source distributed SQL query engine.

Incubated inside Facebook in 2012, Presto is notable for its ability to operate against very large data stores and to query different types of data in relational, NoSQL or proprietary format, stored on Cassandra, Hive or elsewhere. Teradata said in June that it is making a significant commitment to Presto, and this announcement underscores that intention.

Finally, Teradata introduced a formal managed services division for Hadoop, providing platform, application and staffing support using a combination of local and offshore resources.