

Panzura LLC, maker of a hybrid cloud data management platform, announced today that its Symphony data services platform is now integrated with IBM Corp.’s Storage Deep Archive.
The integration, on the IBM Diamondback tape library (pictured), makes “cold” data — meaning archived to tape — easily accessible for information discovery and artificial intelligence training. Symphony is a cloud-based single management point for distributed file systems that span multiple locations. It automates file and object data movement, data discovery and assessment and structures unstructured data using metadata. The company specializes in helping large organizations manage, store and access unstructured data like files, images and videos.
IBM Storage Deep Archive is a cloud-based, low-cost storage tier for long-term data retention and digital preservation. It features air-gapped and encrypted storage to protect against cyberthreats. Deep Archive makes data stored on tape accessible without specialized technical expertise.
Symphony’s data movement framework is also integrated with IBM’s Fusion Data Catalog to create a massively scalable data fabric for rapidly ingesting metadata in a business context. Fusion Data Catalog acts as a centralized inventory of data assets with automatic discovery and classification, data lineage tracking and data governance policies.
Symphony’s metadata discovery features unlock information that would otherwise be inaccessible without human inspection and tagging, said Mike Harvey, senior vice president of technology at Panzura and founder of Moonwalk Universal Inc. Panzura acquired that company, which makes data assessment and storage optimization technology, last year.
“There are a lot of workflows in the enterprise where applications, knowledge workers and scientific users reacquire entire data artifacts when really what they’re interested in is metadata,” he said. “Without cataloging or a metadata services layer, they have to re-acquire entire artifacts to open an application.”
Automatic data discovery scans files for recognizable patterns such as phone numbers and can extract them as metadata. The software can also look for information that is likely to be adjacent to other data types, such as phone numbers and addresses.
“A chief legal counsel can ask a question like how many files the organization has that contain Social Security numbers,” said Glen Shok, vice president of strategic alliances at Panzura. “Those files could be in SharePoint, an S3 bucket, on a NetApp file share and on a network file system. We deliver them all through a chat window.”
Metadata used to train AI models is often embedded in file headers or is otherwise invisible to file systems, Shok said. Symphony extracts that metadata and makes files retrievable as if they were in a local file system.
“We can stream the content to [Amazon Web Services Inc.’s] Glacier Flexible Retrieval, which front-ends the tape infrastructure, without it looking like the data set has moved,” he said. “The namespace and the security is preserved at the front-end.”
In addition to low cost, Deep Archive’s high-density storage reduces energy consumption by up to 97% compared to hard disks, Harvey said.
“You get the carbon offset, low energy consumption, long-term retention and protection from ransomware without severing the link to the original namespace,” he said.
The combination is expected to be especially useful in training AI models, which require vast amounts of data. Information in archival storage often lacks rich metadata and is slow to retrieve. Panzura said integration with Deep Archive can make that data useful for purposes like retrieval-augmented generation.
“We’re creating a metadata catalog from over 500 different file types that’s relevant to the user base,” Shok said. “Instead of building large language models, you can build small language models for expert systems specific to engineering or accounting.”
THANK YOU