Research shows OpenAI’s GPT-4 outperforms humans in financial statement analysis, but skeptics aren’t convinced
OpenAI’s GPT-4 large language model has reportedly demonstrated an ability to analyze financial statements with a level of accuracy that surpasses the best human financial analysts.
The claim comes via a paper written by researchers at the University of Chicago, who say their results suggest a promising future for generative artificial intelligence in the field of financial analysis.
According to the researchers, whose work was first picked up by VentureBeat, GPT-4 was used to analyze the financial statements of publicly listed enterprises, in order to try and predict their future earnings growth. They claim it is incredibly successful, outperforming human financial analysts even when provided with only a few standardized and anonymized balance sheets and income statements, without any additional context.
“We find that the prediction accuracy of the LLM is on par with the performance of a narrowly trained state-of-the-art ML model,” wrote the authors of the report, titled “Financial Statement Analysis with Large Language Models.”
The researchers explained how they used a technique known as “chain-of-thought” prompting to enable GPT-4 to undertake more complex reasoning, essentially mimicking the thought processes of a human financial analysis. By teaching the model to identify trends, compute ratios and synthesize information, they were able to coax it into making accurate predictions. According to the paper, GPT-4 could predict the direction of future earnings with 60% accuracy, surpassing the 53% to 57% accuracy of most human financial analysts.
“LLM prediction does not stem from its training memory,” the researchers said. “Instead, we find that the LLM generates useful narrative insights about a company’s future performance.”
The researchers speculate that GPT-4’s superior performance likely stems from the vast knowledge base it is able to draw upon, together with its ability to recognize business concepts and patterns and conduct intuitive reasoning even with incomplete datasets.
“Taken together, our results suggest that LLMs may take a central role in decision-making,” the researchers said.
Others are skeptical
Whether or not wealthy human investors will be willing to trust GPT-4 is another question, though, and there are reasons to be skeptical of the researchers’ claims. On the Hacker News forum, a user called flourpower471 pointed out that the artificial neural network model used as a benchmark by the researchers dates back to 1989, and cannot be compared to the most advanced models used by financial analysts today.
“That ANN benchmark is nowhere near state of the art,” he said.. “People didn’t stop working on this in 1989 — they realized they can make lots of money doing it and do it privately.”
AI researcher Matt Holden also called into question the researchers’ claims, posting on X that GPT-4 is unlikely to be able to pick stocks that can actually best the performance of a broader index such as the S&P 500.
Not sure about this framing. Seems misleading, no?
The “median analyst” can’t actually successfully “pick stocks” and beat a simple vanguard index fund, so why compare that with an LLM?
I don’t doubt an LLM can outperform median analysts at specific tasks like writing…
— Matt Holden (@holdenmatt) May 24, 2024
Holger Mueller of Constellation Research Inc. said it’s important to understand that while AI is clearly faster at crunching data and going back in time to search for patterns, such as in financial performance, it lacks the same kind of spark as the human brain. “Humans can only analyze data and find patterns by using a whole lot of time and energy,” the analyst said. “But AI cannot match the creativity, fantasies and experience of humans, or at least not yet. Unless these three are addressed and made available to AI, the human will still win.”
Although there’s a long way to do, the researchers say they are encouraged, all the more so because numerical analysis of this kind has traditionally always been something of a challenge for LLMs. Alex Kim, one of the study’s co-authors, said it has always been very difficult for models to carry out computations, perform interpretations and make complex judgments in the same way as a human analyst might.
“While LLMs are effective at textual tasks, their understanding of numbers typically comes from the narrative context and they lack deep numerical reasoning or the flexibility of a human mind,” he said.
Although human financial analysts are unlikely to be replaced by AI anytime soon, the researchers say they believe LLMs can be powerful tools that help to streamline their work, and perhaps make them more effective at their jobs.
The researchers have created an interactive web application for ChatGPT Plus subscribers that can demonstrate GPT-4’s ability to perform financial analysis, though they remind users that they’ll need to verify its accuracy independently.
Image: SiliconANGLE/Microsoft Designer
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU