If you’ve been waiting for an easy way to test HPCC, the alternative to Apache Hadoop built and open sourced by LexisNexis Risk Solutions, then you’re in luck. Starting today HPCC Thor clusters will be available on Amazon Web Services. At the moment this is just a beta and it’s offering by HPCC, not as an official AWS service. There are some limitations, but it’s suitable for giving HPCC a test drive. You can find the documentation here.
“AWS is not the best environment for a cluster like this, but it works,” says HPCC CTO Armando Escalante. “AWS is made for clusters of about two or three servers,” Escalante says. “Dealing with a 100+ node cluster would be a nightmare.”
HPCC has developed its own tools for managing clusters, but for now AWS users will be limited to 20 nodes.
Escalante says more features are coming in the future, including a one button deployment option. Another limitation is that HPCC’s querying system/data warehouse Roxie isn’t available on AWS. Escalante describes Roxie as one HPCC’s core differentiators. But Roxy requires additional infrastructure, such as a load balancer, to be present. Escalante says he’s working with AWS on this, and since AWS already supports similar infrastructure for its hosted Oracle services, it should be feasible.
Escalante says that eventually Amazon will include it as part of the Elastic MapReduce service, which currently lets users spin-up Hadoop clusters on AWS infrastructure. Escalante emphasizes that this will be a bit of a misnomer, since HPCC doesn’t sue MapReduce, but says that Amazon is planning on changing the name next year anyway. AWS is developing some sort of hosted data stream processing tool, possibly to be based on Hadoop, so it’s possible that service will be included in this newly renamed Elastic MapReduce stable as well.
HPCC is currently the main apples-to-apples alternative to Hadoop. Microsoft decided to sunset LINQ to HPC ( (formerly called Dryad) in favor of using Hadoop with its partner HortonWorks. That decision doesn’t bode well for the Microsoft Labs project Daytona. The University of California Berkley has its Spark project, but it doesn’t seem to have any enterprise traction yet (though it is used in production at Conviva). There are some indirect competitors, such as data warehousing solutions, complex events processing solutions and Storm, but otherwise Hadoop and HPCC stand alone at the moment.