UPDATED 19:12 EST / SEPTEMBER 12 2024

AI

Ireland opens privacy probe into Google’s PaLM 2 language model

Ireland’s privacy regulator has opened a probe into Google LLC over its PaLM 2 large language model.

The Data Protection Commission, or DPC, announced the move today. Officials will review whether PaLM 2 was built in a manner compliant with the European Union’s GDPR data privacy regulation. The DPC is responsible for investigating Google’s GDPR adherence because the company’s EU headquarters is based in Ireland.

PaLM 2 is the second iteration of an LLM that originally debuted in 2022. The first version had 540 billion parameters, the configuration settings that determine how a neural network goes about processing data. Google says that PaLM 2 features fewer parameters, yet can achieve higher performance across a range of tasks.

The model understands more than 100 languages and is better at solving math problems than its predecessor. It’s also more adept at generating code. Google says that PaLM 2 supports a range of programming languages including specialized syntaxes such Verilog, which is used in chip design projects to describe how circuits should work.

The model briefly powered Google’s Gemini chatbot, its alternative to ChatGPT, and a number of other services. The search giant later upgraded the chatbot to an LLM series likewise called Gemini that debuted last year. 

The DPC’s new privacy probe into PaLM 2 relates to a section of GDPR that concerns so-called data protection impact assessments, or DPIAs. Those are reviews that tech companies must perform in certain situations to determine if their activities could pose a risk to user privacy. The DPC will investigate whether Google carried out a DPIA review during the development of PaLM 2 in a manner that complied with GDPR requirements.

The regulator didn’t specify what privacy risk may have made it necessary for the search giant to perform such a review. But in recent months, much of the scrutiny over LLMs’ privacy risks has focused on the fact they’re frequently trained on public webpages. Such webpages can potentially contain personal data to which GDPR privacy rules apply.

When Google debuted PaLM 2 last May, the company disclosed that the model’s training dataset included some public web content, namely a “large quantity of publicly available source code datasets.” However, open-source code repositories generally don’t contain the kind of sensitive consumer data to which GDPR applies. 

A DPIA privacy review comprises multiple steps. First, a company must identify whether the manner in which it processes user data might pose privacy risks. Those risks can include, for example, cybersecurity issues that may make the collected data accessible to hackers. 

Companies must also identify whether the information they collected is strictly necessary for the project in which they plan to use it. Personal consumer data, for example, is not necessary to build an LLM optimized to generate code. The GDPR also requires companies to create a plan for addressing any privacy risks they may identify during a DPIA review. 

Google said in a statement that “we take seriously our obligations under the GDPR and will work constructively with the DPC to answer their questions.”

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU