UPDATED 19:17 EDT / APRIL 07 2024

AI

OpenAI, Google and Meta accused of skirting legal boundaries for AI training data

Some three months after suing OpenAI for alleged copyright infringement, the New York Times Co. claims in a new report Saturday that OpenAI, Google LLC and Meta Platform may have acted dubiously in training their artificial intelligence models.

The report opens by targeting OpenAI, claiming that the company used a speech recognition tool called Whisper to transcribe audio from YouTube videos and generate new conversational text for AI training. In an apparent revelation, the report then claims that OpenAI staff discussed whether the decision to transcribe YouTube videos may go against the video site’s rules.

It’s then revealed that OpenAI did transcribe more than 1 million hours of YouTube videos and that this was assisted by OpenAI President Greg Brockman. The transcriptions were then used as part of training GPT-4.

“AI has become a desperate hunt for the digital data needed to advance the technology,” the report claims, before adding that “to obtain that data, tech companies including OpenAI, Google and Meta have cut corners, ignored corporate policies and debated bending the law.”

The report then claims that Meta apparently considered buying publisher Simon & Schuster LLC to procure long works to assist in training their AI and also discussed “gathering copyrighted data from across the internet, even if that meant facing lawsuits” and that “negotiating licenses with publishers, artists, musicians and the news industry would take too long.”

Google is then accused of transcribing YouTube videos to harvest text for its AI models, which the Times reports of “potentially” violating the copyrights of the videos and also changing its terms to allow scraping of publicly available Google Docs, restaurants reviews on Google Maps and other online material to train their AI.

Given that language, the New York Times appears to be trying to paint a scary picture of wholesale copyright theft while often avoiding directly saying so. Google didn’t steal transcriptions, it “potentially” violated copyright; Meta discussed the legality of scraping public data; and OpenAI discussed whether transcribing YouTube might go against some rules.

Those are all reasonable conversations any company developing AI should have when it comes to playing nicely with others and complying with the law. The law is still very gray around fair use and data for AI and the Times knows that, or it wouldn’t be suing OpenAI.

Notably, fair use is at the core of what AI companies are doing and it’s also key to OpenAI’s defense to the Times’ lawsuit. The AI developer argues that training AI models using publicly available content is fair use.

Another telling feature of the article is that it takes 17 paragraphs for the New York Times article to disclose that it’s suing OpenAI over some of the allegations in the report, making the article, intentionally or not, read like an attack piece against what the company sees as its enemies.

Photo: Wikimedia Commons

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.