

The Linux Foundation wants to open up the use of data in much the same way it has helped make open-source software a technology force to be reckoned with.
Announced at the Open Source Summit in Prague Monday, the new Community Data License Agreement is designed to cover the use of nonproprietary data, Linux Foundation Director Jim Zemlin said. Although proprietary data sets owned by the likes of Google LLC, Facebook Inc. and other large internet and cloud companies gives them a large advantage in new analytics and artificial intelligence application, open data sets could start to level the playing field.
Thanks to open-source technologies such as Apache Hadoop and Apache Spark, unstructured data can easily be crunched to provide useful information for businesses and other organizations in a matter of seconds. But providing access to data isn’t so easy, because there is no framework in place to govern its distribution and use. This is where CDLA licenses come in, providing that framework for governments, academic institutions and others to open and share data with ease.
The benefits of being able to share open data more easily are widespread. The Linux Foundation points to the automtive industry as one example, saying more data will help to improve things like safety, energy efficiency and maintenance. “Self-driving cars are heavily dependent on AI systems for navigation, and need massive volumes of data to function properly,” the Foundation said in its announcement. “Once on the road, they can generate nearly a gigabyte of data every second. For the average car, that means two petabytes of sensor, audio, video and other data each year.”
As such, the CDLA licenses are tailor-made for organizations that possess valuable data assets they wish to share. The foundation said the licenses are for contributors and users of open data sets, so they can use and support the contribution of data in a uniform fashion.
The foundation has created two types of CDLA licenses to get the ball rolling, including a sharing license similar to the Gnu General Public License version 2 that encourages users to contribute data back to the community, though this is not required. There’s also a permissive license that comes with no sharing requirements.
The idea is to define a licensing framework in support of collaborative communities focused on the curation and sharing of open data. The frameworks is also intended to allow data producers to share information with “greater clarity” for recipients about what they can do with that data.
The licenses will enable individuals and organizations to share data in the same way they can share open-source code, empowering communities and businesses to do more with data, and perhaps build new data-intensive applications. More specifically, the foundation said, the licenses will provide the following benefits:
The licenses could also serve as an important enabler for the concept of Data as a Service, which refers to a model for providing data to organizations and individuals on demand, eliminating the need for them to accumulate this data themselves.
Holger Mueller, principal analyst and vice president of Constellation Research Inc., said the CDLA licenses are one of the most promising initiatives for getting DaaS off the ground. However, he said it was surprising to see the Linux Foundation take the lead here, as the value flow for DaaS is reversed compared with open-source software.
“For open source, [the value lies in] providing time and receiving software,” Mueller said. “But for DaaS, it’s about giving and getting data. Traditionally people and enterprises have been less sensitive with time, in contrast to data. The hope is now that an actual discourse is starting on data sharing, which will be important for the DaaS success going forward.”
Those sentiments were echoed by the Linux Foundation’s Zemlin, who said open data licenses are necessary for the friction-less sharing sharing of data.
“The success of open source software provides a powerful example of what can be accomplished when people come together around a resource and advance it for the common good,” Zemlin said. “The CDLA licenses are a key step in that direction and will encourage the continued growth of applications and infrastructure.”
THANK YOU