IBM’s New Storage Architecture Incorporates Hadoop

IBM bares its new storage design constructed by scientists at IBM Research-Almaden, with claims to double analytics processing and speed for big data and the cloud through advanced clustering technologies, dynamic file system management and advanced data replication techniques.  The new General Parallel File System-Shared Nothing Cluster (GPFS-SNC) architecture incorporates Hadoop Distributed File System (HDFS).

This storage architecture won the Supercomputing 2010 Storage Challenge based on performance, scalability and storage subsystem utilization.

The division of tasks is between independent nodes since each node is self-sufficient. This enables GPFS-SNC to “convert terabytes of pure information into actionable insights twice as fast as previously possible.” Additionally, it supports POSIX for backward compatibility, caching, replication, backup and recovery, and wide area replication for disaster recovery.  Prasenjit Sarkar of Storage Analytics and Resiliency, IBM Research-Almaden is the master inventor of the project.

“The world is overflowing with petabytes to exabytes of data and the challenge is to store this data efficiently so that it can be accessed quickly at any point in time. This new way of storage partitioning is another step forward on this path as it gives businesses faster time-to-insight without concern for traditional storage limitations,” Sarkar said.

Though Sarkar refused to comment on how IBM can commercialize GPFS, it serves as the basis for the IBM Scale Out Network Attached Storage platform, also known as SONAS platform, used in IBM’s information Archive and the IBM Smart Business Computer Cloud. It scales capacity and performance while providing parallel access to data and a global name space that can manage billions of files and up to 14.4PB of capacity.

Also, GPFS-SNC will be used for VISION Cloud initiative, a group participated by 15 European countries in development of a new approach to cloud storage  where data is represented by smart objects that include information describing the content of the data and how the object should be handled, replicated, or preserved or “smart cloud storage architecture” as they call it. It is a combination of a rich object data model, execution of computations close to the stored content,  content-centric access, and full data interoperability.

For Hadoop, the partnership with IBM proves a big step in its development and adoption.  The open-source cloud initiative has been gaining a number of partners this past year, with huge developments in the social networking space, through Twitter and the recently launched Facebook Mail.

Partners of IBM for VISION Cloud Initiative include SAP AG, Siemens Corporate Technology, Engineering and ITRicity, Telefónica Investigación y Desarrollo, Orange Labs and Telenor, RAI and Deutche Welle, the SNIA Europe standards organization. The National Technical University of Athens, Umea University, Swedish Institute of Computer Science and University of Messin.