

Between the new features and integrations introduced at its third annual community meetup this morning, Apache Spark is marking a landmark new endorsement from IBM, which has decided to back the project to the tune of over 3,500 engineers who will now actively participate in the development of new functionality. The opening contribution of the initiative is a machine learning library called SystemML.
The technology is one of the latest innovations to have emerged from the company’s ongoing work on Watson, which has seen its use expand from answering trivia questions to extracting complicated patterns out of vast quantities of unstructured data over the last few years. To keep up, SystemML provides a language that directly exposes the capabilities of the artificial intelligence for data scientists to harness.
Queries written in the syntax, which is deliberately modeled after the widely-used R statistical programming framework, are automatically executed according to the most efficient mode of operation for the specific workload and operational characteristics of a Spark cluster. Needless to say, that has the potential to provide a tremendous boost for the project’s machine learning capabilities.
But SystemML still only represents tip of the iceberg for IBM’s plans. The bulk of its efforts will focus on integrating Spark into its analytics arsenal, beginning with none other than Watson. The cloud-based incarnation of the artificial intelligence that the company released for the healthcare sector earlier this year is first in line to be standardized on the framework, with other versions presumably due to follow suit later on.
At the same time, IBM is also embedding Spark into its Bluemix platform-as-a-service stack, which will make the capabilities of the framework accessible on-demand for developers and data scientists. The company hopes to bring the total number of professionals skilled in using the project to over a million within a few years through a number of education partnerships announced in conjunction, users who it hopes will tilt toward its implementation over the competition as a result.
Added up, IBM’s commitment to Spark represents the arguably biggest milestone for the project since its inception at UC Berkeley four years ago. The framework is already a fixture of the analytics discussion thanks to its speed and extensibility, but if Big Blue’s past kingmaking role in other open-source projects as Linux is anything to go by, its addition fray could take that to a whole different level.
THANK YOU