Storage Performance for Life Science Applications
Many branches of Life Science research involve the generation, accumulation, analysis, and distribution of “large” amounts of data. The analysis of these data is often compute and IO intensive and can require hours, days, and months to complete. The wall-clock analysis time can be reduced by spreading the computational load over a number of computers working simultaneously. However the rate of time reduction is often limited by the IO capacity of a shared storage system serving data to an increasing number of client computers.
In this paper, we share the experiences of industry and academia, including RosettaInpharmatics and UCLA conducting various cross-vendor storage benchmarking experiments and analysis when performing “real” Life Science analyses. In all these tests, Isilon storage was found to offer the greatest IO performance that scales linearly with storage capacity.
We’ll also explain the differences between achieving high-performance and high-throughput computing when executing many discrete processes that are CPU or IO-bound and how this relates to scaling clusters of computers or clusters of storage.
And finally, we perform our own benchmarking experiments to determine how well Isilon’s symmetric clustered storage performs as the shared file system for the data-intensive scientific research areas of Neuro-imaging and Next-Generation DNA sequencing, looking more closely at how Isilon’s IO performance scales with storage capacity using an open source IO benchmarking tool. More…


