UPDATED 20:21 EST / NOVEMBER 29 2023

AI

Google researchers find personal information can be accessed through ChatGPT queries

Researchers at Google LLC recently released a paper explaining how they were able to use Open AI LP’s ChatGPT to collect personal information regarding members of the public.

Chatbots are powered by large language models, or LLMs, which sift through massive amounts of data on the internet. The idea is that the model is trained to respond to queries based on this information without actually replicating that information, hence linguist Noam Chomsky’s assertion that such models are plagiarism machines in a roundabout way.

The researchers at Google revealed ChatGPT does actually give up the original information if you ask it the right questions. It’s worth noting that as of September this year, ChatGPT had 180.5 million users, and its website had generated 1.5 billion visits. According to Google’s research, some of those people may have been able to see people’s names, email addresses and phone numbers.

“Using only $200 USD worth of queries to ChatGPT (gpt-3.5- turbo), we are able to extract over 10,000 unique verbatim memorized training examples,” said the researchers. “Our extrapolation to larger budgets suggests that dedicated adversaries could extract far more data.”

The researchers explained that by using keywords over and over again, they could force the chatbot to “diverge” from its training, and instead of replying with an answer based on that training, it issued answers containing text from its original language modeling, that is, data from websites and academic papers. They later called their attack “kind of silly,” but it worked.

The training data was exposed despite, as the researchers noted, the entire response didn’t make much sense. The researchers said they checked the data they’d been given by simply finding wherever it was published on the internet. In a blog post, they wrote, “It’s wild to us that our attack works and should’ve, would’ve, could’ve been found earlier.”

They said their research asks us to think about a new security analysis of machine-learning models and to ponder “if any machine-learning system is actually safe.” They added that “over a billion people-hours have interacted with the model,” so it’s strange that no one else so far seems to have noticed this concerning vulnerability.

Photo: Jonathan Kemper/Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.