UPDATED 14:22 EDT / NOVEMBER 21 2011

NEWS

Heroku Adds Automated Support for Hadoop with Help of Stealth Startup

Heroku has added automated support for Hadoop through Treasure-Data, a startup that is still in “stealth,” mode, according to its Web site.

Heroku currently offers Hadoop support with Amazon Elastic MapReduce as a manual set up. But Treasure-Data adds something more. It helps remove the complexity that comes with managing Hadoop clusters.

According to the Heroku Web site:

Through Treasure Data Hadoop add-on, Heroku users can set up and run your Hadoop cluster within 3 MINUTES on the cloud: collect structured logs from your applications, and analyze through SQL-like query language.

You may also add some fine tunings – configure tables, set schemas and schedule your queries. Currently, up to 1T BYTES of raw-data volume is provided as FREE. Please store all events in your app!

The Hadoop support is offered through Heroku’s Add-On service, which is designed to extend the capability of apps that use the developer platform. Developers may customize an app’s architecture then choose features to match needs. This may mean adding cron, deploy hook or backups. Developers may also use it to integrate third party services. For example, Apigee is an add-on partner, offering its Twitter gateway for Heroku developers.

Treasure-Data is in stealth mode but their blog does give a glimpse into what they plan to do. At its core, Treasure-Data offers MessagePack, “a binary-based efficient object serialization library. It enables to exchange structured objects between many languages like JSON. But unlike JSON, it is very fast and small.” They call it “JSON on Steroids.”

Treasure-Data has also developed fluentd, which is a semi-structured stream for log data by getting rid of files, where log data is usually stored. It is similar to Facebook’s Scribe and Cloudera’s Flume. Treasue-Data uses Ruby while Scribe is built on C++ and Flume on Java.

According to the Treasure-Data blog there are two problems with log files. One is the data formatting. It is a cumbersome task for the analytics engineer to write a dedicated parser for each format. And second, lag-time is an issue, which often can pose a problem in Web-based settings.

Fluentd is designed to remove the complexity out of managing big data. That’s a core issue with Hadoop.

The fit seems right for Heroku. CTO Adam Wiggins wrote a post earlier this year that covers the limitations that comes with Ruby and its default configuration for logfiles. The problem is the same we see in this transition to a distributed infrastructure. To alleviate the issue, Heroku developed Logplex, which routes syslog traffic the same way that its HTTP routing mesh routes HTTP traffic.

Wiggins describes it this way:

Logplex handles input streams (which we call “sinks”) from many different sources: all the dynos running on the app, system components like our HTTP router, and (currently in alpha) logs from add-on providers. Sinks are merged together into channels (each app has its own channel) which is a unified stream of all logs relevant to the app. This allows developers to see a holistic view of everything happening with their app, or to filter down to logs from a particular type of sink (for example: just logs from the HTTP router, or just logs from worker processes).

Wiggins sees the need for modern log protocols. He mentions Scribe as an example. And so you can see why Treasure-Data’s Fluentd must be so appealing to Heroku. It, too, is a modern logging protocol.

This all means that we are seeing a new evolution for big data apps. And Heroku is in a good position to take a lead by addressing the cumbersome issues that come with deploying Hadoop clusters.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU