UPDATED 19:41 EST / JULY 19 2023

AI

Unstructured raises $25M for large language model data processing expansion

Large language model data processing startup Unstructured Technologies Inc. has raised $25 million in new funding to expand its operations and business reach.

Founded in 2022 by U.S. Central Intelligence Agency analyst Brian Raymond, Unstructured offers a platform that allows companies to convert their unstructured internal data into formats that are compatible with large language models, the class of artificial intelligence models that power the likes of OpenAI LP’s ChatGPT and other chatbots that can generate humanlike answers and content.

The company provides its users with three starting points: an open-source Python library, containers and a cloud-hosted application programming interface. The API can process more than 20 natural language file types from raw to LLM-ready data and enterprise-grade data connectors. Unstructured’s enterprise-grade data connectors include connectors for Azure Blob, Microsoft Corp.’s OneDrive, Amazon Web Services Inc. S3, Google LLC’s Cloud Storage, Google Drive, Dropbox Inc. and Elasticsearch Inc.

The company’s technology was developed collaboratively with the open-source community, commercial enterprises and select U.S. government defense and intelligence organizations. The company has been awarded Phase I and II Small Business Innovation and Research contracts by the U.S. Air Force and Space Force, with additional support coming from the U.S. Special Operations Command.

According to FINSMES, an agreement between Unstructured and SOCOM has been in place since the company’s inception. The agreement has seen Unstructured collaborate with SOCOM to initiate the first standalone system using an LLM in combination with mission-critical data within the U.S. armed forces.

In an interview with TechCrunch, Raymond explained that the company is attempting to address the issue wherein data is scattered when organizations generate vast amounts of unstructured data on a daily basis. “The dirty secret in the [natural language processing] community is that data scientists today still must build artisanal, one-off data connectors and pre-processing pipelines completely manually,” Raymond said. “Unstructured [delivers] a comprehensive solution for connecting, transforming and staging natural language data for LLMs.”

The $25 million round was led by Bain Capital Venture Associates LLC, with M12 Ventures LLC, Mango Capital Inc., MongoDB Ventures and Shield Capital Partners LP also participating. The round was the company’s first publicly disclosed fundraise since it was founded last year.

Image: Unstructured Technologies

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU