UPDATED 20:00 EST / JULY 15 2024

AI

Microsoft’s experimental SpreadsheetLLM helps AI better understand spreadsheets

Researchers at Microsoft Corp. today released details of an experimental artificial intelligence model called SpreadsheetLLM, and as the name suggests, it’s designed to work with spreadsheets such as Excel and Google Sheets.

The model, discussed in a July 12 research paper posted onto Arxiv.org, is aimed at solving the challenges of applying AI to spreadsheets, which are widely used in the business world but have proven difficult for large language models to get to grips with.

According to Microsoft’s researchers, SpreadsheetLLM utilizes a novel approach for encoding spreadsheet contents into a new format that LLMs can more easily work with. As such, it paves the way for these models to “reason over spreadsheet contents.”

The researchers highlighted the critical need for improvements in this particular area of AI. Spreadsheets are used for a wide range of tasks, ranging from simple data entry and analysis to complex financial modeling and decision-making. But existing LLMs struggle to understand and reason over the contents of spreadsheets. The problem has to do with the highly structured nature of the data within them, and the presence of complex formulas and references.

SpreadsheetLLM reportedly gets around this by encoding spreadsheet data in a more LLM-friendly way, so they can better understand it.

To do this, the researchers came up with a novel encoding mechanism called SheetCompressor that preserves the structure and relationships of the data, while making it accessible to LLMs. SheetCompressor notably compresses the data by up to 96%, so LLMs can handle large datasets within their token limits.

The researchers also highlighted another feature, called “structural anchor extraction,” which identifies the key rows and columns that define table structures. Meanwhile, “inverted-index translation” is a method that efficiently encodes cell contents and addresses to minimize redundancy, while “data format-aware aggregation” helps to group cells with similar formats, to minimize token usage further.

In their experiments, the researchers found that SpreadsheetLLM achieved some impressive results in a spreadsheet table detection test, outperforming existing methods by 12.3%. In addition, it achieved strong results on spreadsheet question-answering tasks.

SpreadSheetLLM was applied to a range of well-known LLMs, including GPT-3.5, GPT-4 and Llama 2, and the tests showed that it significantly enhanced those models’ ability on spreadsheet understanding tasks. For instance, GPT-4 achieved a table detection score of 78.9%.

The researchers said SpreadsheetLLM is still an experimental model and has some limitations around more complex spreadsheet formats, but they also believe it has a lot of potential. For instance, they say it could be applied to tasks such as automating routine data analysis to generate insights and recommendations based on spreadsheet contents. By helping LLMs understand spreadsheets, answer questions about them and even create new ones based on natural language prompts, it opens the door to new possibilities in AI-assisted data analysis and decision making.

SpreadsheetLLM can also help make spreadsheets more accessible to human workers, many of whom struggle to get to grips with the more complicated capabilities of tools like Excel. One of the challenges of working with spreadsheets is the need to learn complex formulas to manipulate the data within. But SpreadsheetLLM could help users manipulate that data using natural language commands instead.

Finally, the researchers say, SpreadsheetLLM could help to automate some of the more tedious tasks associated with spreadsheets, such as data cleaning, formatting and aggregation.

Constellation Research Inc. analyst Holger Mueller said the research is significant, because much of the world’s business runs on Excel spreadsheets. “It’s vital for Microsoft to be at the forefront of this push to make Excel spreadsheets more accessible through AI,” he said. “Verbal access to spreadsheets provides massive value, both for creating and analyzing Excel files.”

Mueller said AI also has the potential to democratize the use of spreadsheets by making them simple for anyone to work with. “If Microsoft can nail this properly, it will not only secure the future of Excel, but change the future of work as we know it,” he predicted.

For now, SpreadsheetLLM is only a research project, and Microsoft hasn’t said if it has any plans to transform it into an actual product. But it’s not hard to imagine some kind of “Copilot for Excel” might emerge from this research.

Main image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU