Google researchers find personal information can be accessed through ChatGPT queries
Researchers at Google LLC recently released a paper explaining how they were able to use Open AI LP’s ChatGPT to collect personal information regarding members of the public.
Chatbots are powered by large language models, or LLMs, which sift through massive amounts of data on the internet. The idea is that the model is trained to respond to queries based on this information without actually replicating that information, hence linguist Noam Chomsky’s assertion that such models are plagiarism machines in a roundabout way.
The researchers at Google revealed ChatGPT does actually give up the original information if you ask it the right questions. It’s worth noting that as of September this year, ChatGPT had 180.5 million users, and its website had generated 1.5 billion visits. According to Google’s research, some of those people may have been able to see people’s names, email addresses and phone numbers.
“Using only $200 USD worth of queries to ChatGPT (gpt-3.5- turbo), we are able to extract over 10,000 unique verbatim memorized training examples,” said the researchers. “Our extrapolation to larger budgets suggests that dedicated adversaries could extract far more data.”
The researchers explained that by using keywords over and over again, they could force the chatbot to “diverge” from its training, and instead of replying with an answer based on that training, it issued answers containing text from its original language modeling, that is, data from websites and academic papers. They later called their attack “kind of silly,” but it worked.
The training data was exposed despite, as the researchers noted, the entire response didn’t make much sense. The researchers said they checked the data they’d been given by simply finding wherever it was published on the internet. In a blog post, they wrote, “It’s wild to us that our attack works and should’ve, would’ve, could’ve been found earlier.”
They said their research asks us to think about a new security analysis of machine-learning models and to ponder “if any machine-learning system is actually safe.” They added that “over a billion people-hours have interacted with the model,” so it’s strange that no one else so far seems to have noticed this concerning vulnerability.
Photo: Jonathan Kemper/Unsplash
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU