The Role of Multi-Tier Storage in High Performance Computing Environments

Our society is currently undergoing an explosion in digital data that shows no sign of slowing down. The volume of data created each day has increased immensely and will continue to grow exponentially over time, especially in high performance research environments. Where should organizations store their data and what is the best way to manage it? Sure, short-term fixes are appealing due to their easy implementation, but often worsen long-term storage challenges associated with performance, scalability and cost. It is essential to consider future needs when examining storage options.

1. Can Cloud Work in HPC?
With its low up-front costs and ongoing op-ex model of pricing, the Cloud seems like an attractive option for many organizations battling the challenge of managing their ever-growing data. But when users look at the true cost of going to the cloud, it becomes clear that the ongoing costs add up over time – making cloud a more expensive solution long-term.  Between the cost of getting data to and from the cloud (bandwidth) to the excessive data retrieval charges, the cloud storage model is not one that is sustainable in an HPC environment. So why do we still hear about the cloud? This is usually related to the compute power that is in the cloud and its capability to run data analytics. When used correctly for its processing power, the cloud can be a powerful tool for the HPC community. However, the cloud can easily become a financial burden when used for massive data storage.  It’s important to understand how to use the cloud before jumping in feet first and trying to learn to swim. To learn more about the costs of moving to the cloud, check out this white paper.

2. Multi-tier for cost-effectiveness
A multiple storage tier strategy is an important concept to any data storage environment, but an effective tiered storage implementation is becoming a requirement in the HPC market. Not all data created today is of critical importance, but that is not to say that the information has no value; therefore, HPC environments need to find a way to affordably store their data long-term. Universities, for example, must keep all research, findings, notes, and general data for a minimum of seven years after completion of a federally-funded project.  This is a requirement of anyone who accepts National Science Foundation (NSF) grants for research. It’s a huge expense and, often times, the data, while never accessed again, must still be kept.

This is where the importance of a tiered storage strategy is vital to the sustainability of many HPC environments.  Without a tiered strategy to move inactive data to a lower cost storage medium, these organizations would be stuck with huge storage bills for data that is considered “cold” and rarely, if ever, accessed again.  Finding the correct balance between speed of access (or need to access) and the cost of storage has never been so important for these organizations.

Fig. 1 – The Spectra® BlackPearl® Converged Storage System’s multi-tier architecture.

3 Active Archive for access
An Active Archive is a proven solution for the HPC market that has been around in the data storage industry for nearly 10 years. This automated tiered storage approach balances speed of access with cost of storage by offloading expensive primary storage while still keeping all of an organization’s data online. As organizations face the problem of storing more data with less money, an active archive solution enables them to store massive amounts of data at an affordable price, without sacrificing online access to their assets. To learn more about active archives, read the “Active Archive and the State of the Industry 2018” report by the Active Archive Alliance, a collaborative industry alliance of leading providers of tiered storage technologies, including file systems, HSM, applications, cloud storage, high-density tape and disk storage.

High Performance Computing environments continually test the limits of technology and require peak performance from their equipment—including storage. Any future-looking data storage solution for HPC should leverage a multi-tier architecture that sustainably balances speed of access and cost. The Spectra® BlackPearl® Converged Storage System allows customers in university, high performance computing, and research organizations to seamlessly store data to disk, tape and cloud storage using a unified interface provided by BlackPearl Certified Clients. Spectra’s partner integrations deliver advanced data management capabilities to campus researchers across diverse sites and systems through consistent, easy-to-use web interfaces. Spectra’s data storage solutions help these environments push the boundaries of their operational objectives, providing cost-effective storage that meets all of their performance, growth and environmental needs. Read the University of Minnesota Supercomputing Institute’s case study for a real-world example of how the University employs a Spectra Logic solution to seamlessly archive and share petabytes of information with unbeatable density, features and scalability.

