IBM open-sources its SystemML machine learning tech


IBM has fulfilled its promise to open-source SystemML, a machine learning system that’s now been accepted as an Apache Incubator project.

It’s a significant milestone for SystemML, which is already used to power IBM’s BigInsights data analytics platform. The Apache Incubator program is a kind of stepping stone on the way to becoming a full project under The Apache Software Foundation, where developers ensure code donations adhere to the ASF’s guidelines and that the community follows its principles.

The SystemML technology emerged from IBM’s development of Watson, and integrates closely with another Apache project, Spark. SystemML helps Watson to keep up to date by providing a language that directly exposes the capabilities of the artificial intelligence so data scientists can harvest it. Queries are written in syntax modeled after the popular R statistical programming framework, before being executed according to the most efficient mode of operation for the specific workload and operational characteristics of a Spark cluster.

Here’s the full definition of the project from the official Apache SystemML site:

“SystemML provides declarative large-scale machine learning (ML) that aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations, to distributed computations on Apache Hadoop and Apache Spark. ML algorithms are expressed in a R or Python syntax, that includes linear algebra primitives, statistical functions, and ML-specific constructs. This high-level language significantly increases the productivity of data scientists as it provides (1) full flexibility in expressing custom analytics, and (2) data independence from the underlying input formats and physical data representations. Automatic optimization according to data characteristics such as distribution on the disk file system, and sparsity as well as processing characteristics in the distributed environment like number of nodes, CPU, memory per node, ensures both efficiency and scalability.”

IBM said it would be donating SystemML to The Apache Foundation back in June this year, and the project has already hit a significant number of milestones since then, including more than 320 patches including APIs, Data Ingestion, Optimizations and Additional Algorithms. There have also been more than 90 contributions to the Apache Spark project from IBM’s engineers, aimed at making Machine Learning compatible with Spark.

IBM’s move to open-source SystemML continues a trend set by other tech giants including Google, which recently open-sourced its TensorFlow machine learning software, and Facebook, which donated artificial intelligence and machine learning tools to the existing Torch open-source project.

This is all great news for data-driven enterprises, which now have an array of free, open-source machine learning tools to choose from. Whereas Google’s TensorFlow and Facebook’s Torch are designed to train neural networks, SystemML helps to broaden the ecosystem for every type of business to use.

Of course the tech giants benefit just as much, if not even more so. Open-sourcing their machine learning tools means they’ll be getting access to much more data , and that data is what helps these technologies to evolve and become even more powerful. For IBM too, it has the added bonus that if SystemML can scale, the platform could well provide a gateway for customers to try out the rest of its data analytics tools.

Image credit: geralt via