Improving reliability without replicating data
You can utilize GPFS storage pools to provide levels of reliability within a file system without replicating all of your data and metadata.
Here the metadata is replicated (mirrored) for reliability and there are separate data pools. This way you can gain the benefits of using a global namespace while reducing the risk of data loss with the loss of a storage array.This allows you to take advantage of the policy based file management without losing the reliability of separate file systems. In this model if any single storage server fails you only lose laccess to the data in that pool.
General Parallel File System (GPFS) is a scalable, parallel,cluster file product that originated as Almaden's Tiger Shark file system. It now supports IBM® Blue Gene® and IBM® eServer Cluster systems, including the Linux (Cluster 1350) and the AIX ( Cluster 1600) systems. Tiger Shark was originally developed for large-scale multimedia, but in its GPFS incarnation it has been extended to support the additional requirements of parallel computing. GPFS supports single cluster file systems of multiple petabytes and runs at I/O rates of more than 100 gigabytes per second with next-generation targets of terabytes per second. GPFS clusters can be any combination of Linux systems, AIX systems, and Windows server systems. Individual clusters may be cross-connected to provide parallel access to data even across large geographic distances.
GPFS is the file system for the ASC Purple Supercomputer. ASC (the Advanced Simulation and Computing program) is a Department of Energy initiative to use computer simulation rather than nuclear testing to ensure the safety, reliability and performance of the nuclear stockpile. This requires computational, storage and I/O capabilities far beyond what existed before. ASC Purple is the current generation computing platform at Lawrence Livermore featuring 12,000 processors, a data store of 2 petabytes and I/O rates over 130 GB/sec to a single file or multiple files.
The scope of GPFS extends to include the Blue Gene machines. In this environment, a GPFS provides high-bandwidth I/O to the Blue Gene compute nodes using daemons that relay such requests to the designated I/O nodes. The I/O nodes form a GPFS cluster that communicates in parallel with another (typically Linux) cluster outside the Blue Gene machine. The external cluster actually has the physical connections to the disk volumes and operates as remote disk servers to the cluster within the Blue Gene. These systems can be thousands of I/O nodes and 10s of thousands of compute nodes.
In addition to high-speed parallel file access, GPFS provides fault tolerance, including automatic recovery from disk and node failures. Its robust design and multi-node access have made GPFS the chosen file system for a number of commercial applications such as large Web servers, data mining, digital libraries, file servers, and online data bases.
Significant advances have been made recently in the area of data management where file systems, often in cooperation with other data management software (e.g. IBM's Tivoli Storage Manager or the collaborative High Performance Storage System (HPSS) . By changing the level of the interface between the file system and these external storage manager products and then exploiting parallelism both within a node and across nodes, GPFS has moved the range of the possible from about 50 million files upwards to 50 billion. Along with the improved scaling, work is ongoing to give space administrators of other software packages the ability to control data placement and movement to meet specific user requirements and data attributes.
IBM Research - Almaden is working with IBM's product divisions to extend GPFS to support a new 2011-2012 generation of supercomputers featuring up to 16,000 nodes and 500,000 processor cores. Such a system must be capable of achieving I/O rates of several terabytes per second to a single file, be capable of creating 30,000 to 40,000 files per second, and holding up to a trillion files (to create a trillion files, just create 30,000 files per second continuously for a year). This means new designs for algorithms appropriate to fine grained parallelism within large SMP (32-way or 64-way) processors and coarse grained parallelism across a 16,000 node complex. It also means group management algorithms that can bring 16,000 machines online in reasonable amounts of time and management of failure and recovery as almost routine. All this must produce a stability that would allow for the creation and maintenance of a trillion-file environment.