UPDATED 19:41 EDT / JULY 19 2023

AI

Unstructured raises $25M for large language model data processing expansion

Large language model data processing startup Unstructured Technologies Inc. has raised $25 million in new funding to expand its operations and business reach.

Founded in 2022 by U.S. Central Intelligence Agency analyst Brian Raymond, Unstructured offers a platform that allows companies to convert their unstructured internal data into formats that are compatible with large language models, the class of artificial intelligence models that power the likes of OpenAI LP’s ChatGPT and other chatbots that can generate humanlike answers and content.

The company provides its users with three starting points: an open-source Python library, containers and a cloud-hosted application programming interface. The API can process more than 20 natural language file types from raw to LLM-ready data and enterprise-grade data connectors. Unstructured’s enterprise-grade data connectors include connectors for Azure Blob, Microsoft Corp.’s OneDrive, Amazon Web Services Inc. S3, Google LLC’s Cloud Storage, Google Drive, Dropbox Inc. and Elasticsearch Inc.

The company’s technology was developed collaboratively with the open-source community, commercial enterprises and select U.S. government defense and intelligence organizations. The company has been awarded Phase I and II Small Business Innovation and Research contracts by the U.S. Air Force and Space Force, with additional support coming from the U.S. Special Operations Command.

According to FINSMES, an agreement between Unstructured and SOCOM has been in place since the company’s inception. The agreement has seen Unstructured collaborate with SOCOM to initiate the first standalone system using an LLM in combination with mission-critical data within the U.S. armed forces.

In an interview with TechCrunch, Raymond explained that the company is attempting to address the issue wherein data is scattered when organizations generate vast amounts of unstructured data on a daily basis. “The dirty secret in the [natural language processing] community is that data scientists today still must build artisanal, one-off data connectors and pre-processing pipelines completely manually,” Raymond said. “Unstructured [delivers] a comprehensive solution for connecting, transforming and staging natural language data for LLMs.”

The $25 million round was led by Bain Capital Venture Associates LLC, with M12 Ventures LLC, Mango Capital Inc., MongoDB Ventures and Shield Capital Partners LP also participating. The round was the company’s first publicly disclosed fundraise since it was founded last year.

Image: Unstructured Technologies

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.