UPDATED 23:41 EDT / OCTOBER 23 2017

BIG DATA

Linux Foundation creates a framework for sharing open data

The Linux Foundation wants to open up the use of data in much the same way it has helped make open-source software a technology force to be reckoned with.

Announced at the Open Source Summit in Prague Monday, the new Community Data License Agreement is designed to cover the use of nonproprietary data, Linux Foundation Director Jim Zemlin said. Although proprietary data sets owned by the likes of Google LLC, Facebook Inc. and other large internet and cloud companies gives them a large advantage in new analytics and artificial intelligence application, open data sets could start to level the playing field.

Thanks to open-source technologies such as Apache Hadoop and Apache Spark, unstructured data can easily be crunched to provide useful information for businesses and other organizations in a matter of seconds. But providing access to data isn’t so easy, because there is no framework in place to govern its distribution and use. This is where CDLA licenses come in, providing that framework for governments, academic institutions and others to open and share data with ease.

The benefits of being able to share open data more easily are widespread. The Linux Foundation points to the automtive industry as one example, saying more data will help to improve things like safety, energy efficiency and maintenance. “Self-driving cars are heavily dependent on AI systems for navigation, and need massive volumes of data to function properly,” the Foundation said in its announcement. “Once on the road, they can generate nearly a gigabyte of data every second. For the average car, that means two petabytes of sensor, audio, video and other data each year.”

As such, the CDLA licenses are tailor-made for organizations that possess valuable data assets they wish to share. The foundation said the licenses are for contributors and users of open data sets, so they can use and support the contribution of data in a uniform fashion.

The foundation has created two types of CDLA licenses to get the ball rolling, including a sharing license similar to the Gnu General Public License version 2 that encourages users to contribute data back to the community, though this is not required. There’s also a permissive license that comes with no sharing requirements.

The idea is to define a licensing framework in support of collaborative communities focused on the curation and sharing of open data. The frameworks is also intended to allow data producers to share information with “greater clarity” for recipients about what they can do with that data.

The licenses will enable individuals and organizations to share data in the same way they can share open-source code, empowering communities and businesses to do more with data, and perhaps build new data-intensive applications. More specifically, the foundation said, the licenses will provide the following benefits:

Data producers can share with greater clarity about what recipients may do with it. Data producers can also choose between sharing and permissive licenses and select the model that best aligns with their interests. In either case, data producers should enjoy the clarity of recognized terms and disclaimers of liabilities and warranties.
Data communities can standardize on a license or set of licenses that provide the ability to share data on known, equal terms that balance the needs of data producers and data users. Data communities have a high degree of flexibility to add their own governance and requirements for curating data as a community, particularly around areas such as personally identifiable information.
Data users looking for datasets to help kick off training an AI system or for any other use will have the ability to find data shared under a known license model with terms that clearly state their rights and responsibilities.

The licenses could also serve as an important enabler for the concept of Data as a Service, which refers to a model for providing data to organizations and individuals on demand, eliminating the need for them to accumulate this data themselves.

Holger Mueller, principal analyst and vice president of Constellation Research Inc., said the CDLA licenses are one of the most promising initiatives for getting DaaS off the ground. However, he said it was surprising to see the Linux Foundation take the lead here, as the value flow for DaaS is reversed compared with open-source software.

“For open source, [the value lies in] providing time and receiving software,” Mueller said. “But for DaaS, it’s about giving and getting data. Traditionally people and enterprises have been less sensitive with time, in contrast to data. The hope is now that an actual discourse is starting on data sharing, which will be important for the DaaS success going forward.”

Those sentiments were echoed by the Linux Foundation’s Zemlin, who said open data licenses are necessary for the friction-less sharing sharing of data.

“The success of open source software provides a powerful example of what can be accomplished when people come together around a resource and advance it for the common good,” Zemlin said. “The CDLA licenses are a key step in that direction and will encourage the continued growth of applications and infrastructure.”

Image: IBM Curiosity Shop/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Linux Foundation creates a framework for sharing open data

Image: IBM Curiosity Shop/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Linux Foundation creates a framework for sharing open data

Image: IBM Curiosity Shop/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Cookies