

Two Oxford, U.K.-based organizations today released a study examining the inherent gender bias of 13 AI-based chatbots, and the results aren’t encouraging.
The study examined the responses to various prompts and ranked them according to professional bias of women and men in a workplace content and in telling stories about fictional protagonists.
Calling out bias in AI language models isn’t new: SiliconANGLE wrote about the topic back in 2019 with an interview of Rumman Chowdhury, one of the pioneers in the field. She is still a very active researcher, now with Humane Intelligence, and recently testified before the U.S. Senate on AI algorithmic issues earlier this month.
What is new is this focus on gender bias, as well as the large number of models tested. Analysts from Haia.ai and BuildAligned.ai found that OpenAI’s ChatGPT-4, Databricks’ Dolly 2.0 and Stability AI’s StableLM-Tuned-Alpha had the greatest gender bias, and EleutherAI’s GPT-J had the least. GPT-J is the product of an open-source AI research project.
These results are from a new algorithm designed to measure gender bias called FaAIr that compares the outputs of the model for male-gendered inputs versus the outputs for female-gendered inputs. The results from numerous prompts were then averaged to score each chatbot, as shown in the diagram below. Note that ChatGPT-4 scored as most biased in the professional context, but near the middle of the pack for bias in the fictional context.
The researchers pose this question: “How much information does gender give you about the next tokens of the model? If it gives you a lot of information, your model is highly gender biased: male and female will result in very different outputs. If they don’t, then gender isn’t important.”
“We need numerical benchmarks so that we can track changes and improvements, so hopefully this will help the industry to make much-needed improvements in LLMs,” said Dr. Stuart Armstrong, chief technology officer at Aligned AI.
“We believe the safe and responsible adoption of AI can unleash humanity’s fullest potential,” said Haia’s Bart Veenman, who’s also chief commercial officer at blockchain platform Humans.ai. “Mitigating biases is critical for AI to be beneficial to society. The initial testing results from this independent study show areas for improvement.”
THANK YOU