UPDATED 23:41 EDT / OCTOBER 23 2017

BIG DATA

Linux Foundation creates a framework for sharing open data

The Linux Foundation wants to open up the use of data in much the same way it has helped make open-source software a technology force to be reckoned with.

Announced at the Open Source Summit in Prague Monday, the new Community Data License Agreement is designed to cover the use of nonproprietary data, Linux Foundation Director Jim Zemlin said. Although proprietary data sets owned by the likes of Google LLC, Facebook Inc. and other large internet and cloud companies gives them a large advantage in new analytics and artificial intelligence application, open data sets could start to level the playing field.

Thanks to open-source technologies such as Apache Hadoop and Apache Spark, unstructured data can easily be crunched to provide useful information for businesses and other organizations in a matter of seconds. But providing access to data isn’t so easy, because there is no framework in place to govern its distribution and use. This is where CDLA licenses come in, providing that framework for governments, academic institutions and others to open and share data with ease.

The benefits of being able to share open data more easily are widespread. The Linux Foundation points to the automtive industry as one example, saying more data will help to improve things like safety, energy efficiency and maintenance. “Self-driving cars are heavily dependent on AI systems for navigation, and need massive volumes of data to function properly,” the Foundation said in its announcement. “Once on the road, they can generate nearly a gigabyte of data every second. For the average car, that means two petabytes of sensor, audio, video and other data each year.”

As such, the CDLA licenses are tailor-made for organizations that possess valuable data assets they wish to share. The foundation said the licenses are for contributors and users of open data sets, so they can use and support the contribution of data in a uniform fashion.

The foundation has created two types of CDLA licenses to get the ball rolling, including a sharing license similar to the Gnu General Public License version 2 that encourages users to contribute data back to the community, though this is not required. There’s also a permissive license that comes with no sharing requirements.

The idea is to define a licensing framework in support of collaborative communities focused on the curation and sharing of open data. The frameworks is also intended to allow data producers to share information with “greater clarity” for recipients about what they can do with that data.

The licenses will enable individuals and organizations to share data in the same way they can share open-source code, empowering communities and businesses to do more with data, and perhaps build new data-intensive applications. More specifically, the foundation said, the licenses will provide the following benefits:

  • Data producers can share with greater clarity about what recipients may do with it. Data producers can also choose between sharing and permissive licenses and select the model that best aligns with their interests. In either case, data producers should enjoy the clarity of recognized terms and disclaimers of liabilities and warranties.
  • Data communities can standardize on a license or set of licenses that provide the ability to share data on known, equal terms that balance the needs of data producers and data users. Data communities have a high degree of flexibility to add their own governance and requirements for curating data as a community, particularly around areas such as personally identifiable information.
  • Data users looking for datasets to help kick off training an AI system or for any other use will have the ability to find data shared under a known license model with terms that clearly state their rights and responsibilities.

The licenses could also serve as an important enabler for the concept of Data as a Service, which refers to a model for providing data to organizations and individuals on demand, eliminating the need for them to accumulate this data themselves.

Holger Mueller, principal analyst and vice president of Constellation Research Inc., said the CDLA licenses are one of the most promising initiatives for getting DaaS off the ground. However, he said it was surprising to see the Linux Foundation take the lead here, as the value flow for DaaS is reversed compared with open-source software.

“For open source, [the value lies in] providing time and receiving software,” Mueller said. “But for DaaS, it’s about giving and getting data. Traditionally people and enterprises have been less sensitive with time, in contrast to data. The hope is now that an actual discourse is starting on data sharing, which will be important for the DaaS success going forward.”

Those sentiments were echoed by the Linux Foundation’s Zemlin, who said open data licenses are necessary for the friction-less sharing sharing of data.

“The success of open source software provides a powerful example of what can be accomplished when people come together around a resource and advance it for the common good,” Zemlin said. “The CDLA licenses are a key step in that direction and will encourage the continued growth of applications and infrastructure.”

Image: IBM Curiosity Shop/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU