UPDATED 17:39 EDT / NOVEMBER 09 2023

OpenAI launches partner initiative focused on creating AI training datasets

OpenAI LP today announced a new initiative, OpenAI Data Partnerships, through which it will collect records from other organizations to create artificial intelligence training datasets.

The quality of training files directly influences the reliability of the neural network they’re used to build. The more relevant the dataset, the more accurately the neural network can answer users’ questions. Creating a high-quality dataset is often a time-consuming and expensive process, which is likely one reason OpenAI is seeking the help of external organizations.

One goal of the company’s new partner initiative is to assemble private datasets that can be used to train its foundation models. Additionally, OpenAI will leverage the records for model customization. Last week at its DevDay product event, the company debuted a program that allows enterprises to customize GP-4 for their requirements by “modifying every step of the model training process.”

Another goal of the initiative is to create an open-source AI dataset that will be free for developers to use. According to OpenAI, the database will be specifically geared towards language model projects. The company added that it may consider using the files in the repository to build and publish open-source AI models.

OpenAI already offers a collection of open-source neural networks. The two newest additions to the lineup, Whisper large-v3 and Consistency Decoder, made their debut at the company’s DevDay event last week. They focus on transcription and image generation tasks, respectively.

Several early participants signed up for the OpenAI Data Partnerships initiative ahead of its debut today. The Icelandic government and Miðeind ehf, a Reykjavík-based software company, are working with OpenAI to make GPT-4 more fluent in Icelandic. Meanwhile, the nonprofit organization Free Law Project is contributing a collection of legal documents.

“We’re interested in large-scale datasets that reflect human society and that are not already easily accessible online to the public today,” OpenAI detailed in a blog post. “We’re particularly looking for data that expresses human intention (e.g. long-form writing or conversations rather than disconnected snippets), across any language, topic, and format.”

OpenAI is seeking multiple types of training data including text, images, audio and video. That suggests the company plans to use files contributed by partners to train not only language models, but also other types of neural networks such as image generators. OpenAI will accept training datasets even if they contain errors or are stored in a format that is difficult to process.

“We can work with data in almost any form and can use our next-generation in-house AI technology to help you digitize and structure your data,” OpenAI stated.

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

OpenAI launches partner initiative focused on creating AI training datasets

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

OpenAI launches partner initiative focused on creating AI training datasets

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026