UPDATED 13:54 EDT / SEPTEMBER 09 2011

NEWS

LexisNexis Puts Its Hadoop Competitor on GitHub

HPCC Systems, the Apache Hadoop competitor developed at LexisNexis Risk Systems, just shared its source code on GitHub. The company announced in June that it would open source the project and it’s now made good on that promise. HPCC Systems released virtual machines running the HPCC platform in June, but now for the first time developers will be able to take a look at the code and customize it to their own ends.

Escalante says the HPCC team had to clean up the code to prepare it for public consumption and create a contributor agreement before the company could publish the source. He also says the company contracted both Black Duck and Palamida to audit the code to make sure everything was properly sourced and licensed.

HPCC stands for High Performance Computing Cluster. HPCC is distinguishing itself from Hadoop with its “SQLish” programming language called ECL and its near real time query system called Roxie. Wikibon’s Jeff Kelly did a comparison of Hadoop and HPC in June and concluded that companies that want to get started with big data take a look at both HPCC and Hadoop.

Armando Escalante, CTO of Risk Solutions, said at the GigaOM Structure conference that the company may start offering a data-as-a-service which will give customers access to cloud hosted HPCC clusters. He also said the company might make some of LexisNexis’ data sets available for analysis via this service. I’ve previously speculated that Microsoft is taking steps in this direction as well.

Services Angle

While I think data-as-a-service will be an important market in the future, that’s still some time off. But enterprises managing development can learn some more immediately applicable lessons from Escalante and his team’s experience of taking the product open source.

Escalante’s first piece of advise is for development teams to treat all projects as if they were open source, even if they are only used internally. Not only does this make these projects more ready to be open sourced in the future, he says, but it forces best practices that improve collaboration internally.

Escalante says the HPCC team had to make some changes to structure of the project to make it work as a GitHub project. They also had to clean up the comments, get rid of dead code that had never actually been used in the project and make various elements more consistent. He recommends writing all code comments with the assumption that eventually the public will see them, and structuring a project as if it were to land in GitHub eventually. “When you work in an open source manner, you work more efficiently, even internally, because it’s accessible to more developers,” he says.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU