DARPA Backs Python Big Data Projects With $3M Cash


Texas-based startup Continuum Analytics has just received a $3 million cash injection courtesy of DARPA, the Defense Advanced Research Projects Agency, to help it further its progress with using the Python programming language as a means of crunching Big Data. The cash will be used to develop two specific open source tools – Blaze, which is a scientific computing library, and Bokeh, which provides data visualizations.

DARPA says that the money comes from its XDATA fund, which is a $100 million rewards program set up last year to assist third parties in the development of software and techniques for analyzing Big Data.

It’s the first real effort that anyone’s made with Python for Big Data. The programming language has been hugely successful in other areas, like with web programmers and the scientific community, but now Continuum wants to extend its usefulness. The company’s stated goal is “developing the next generation of tools to make Python as powerful and successful for big data and business data analytics as it has been for science, engineering, and scalable computing.”

Why Python? It’s a matter of targeting the most attractive languages, appealing to a broad developer base.  “Python is a relatively easy to use language…but generally speaking it’s easier to go to the developer communities, find the languages popular with them, and find ways to apply that to big data,” explains Jeff Kelly, an analyst with The Wikibon Project.  This method is a much more efficient alternative than writing a new language for developers to learn, Kelly goes on to explain during his appearance on this morning’s NewsDesk show with Kristin Feledy.  See Kelly’s full analysis below.

DARPA says that it’s particularly interested in helping Continuum with its Blaze technology, which is used for writing Python code that can run analytic jobs across distributed systems and different environments. It’s hoped that Blaze will be able to extend the current SciPy and NumPy scientific and mathematical science libraries, which will in turn make them more useful for Big Data. Continuum hopes that with Blaze, it will be able to “handle out-of-core computations on large data sets that exceed the system memory capacity, as well as on distributed and streaming data.” A number of the original developers behind those technologies, including Travis Oliphant, author or NumPy, will assist with the development of Blaze.

Then there’s Bokeh, a Python-based, HTML5 data visualization library designed for big, multidimensional data, what Continuum calls its “scalable, interactive and easy-to-use visualization system”. The technology will offer various visualization possibilities, such as Grammar of Graphics and the Stencil visualization model.

These two technologies will both support Continuum’s flagship product offering, Disco, which is a Python-based take on Hadoop that supports both SciPy and NumPy; and Wakaria, Continuum’s browser-based analytics platform.

It’s no surprise that DARPA is being so generous with Continuum, when one considers its lavish spending on Big Data developers in recent weeks. The organization recently funded Kitware to the tune of $4 million to help its open source data aggregation projects, while the Georgia Institute of Technology received $2.7 million in readies to bolster its own work on scalable machine-learning technologies.