Preview only show first 10 pages with watermark. For full document please download

Nibmg - Ddn Storage

   EMBED


Share

Transcript

SUCCESS STORY SUCCESS STORY ACCELERATE: LIFE SCIENCES NIBMG Deploys DataDirect™ Networks Storage to Increase Scientific Collaboration and High Performance Analysis for Cutting-Edge Oral Cancer Research C H A L L ENGES • Limitations with legacy storage platform hindered NIBMG’s ability to keep pace with massive amounts of genomics data resulting from major cancer research study • Performance bottlenecks and storage failures occurred when NIBMG produced large datasets from next-gen sequencers and tried to feed them to HighPerformance Compute clusters • Inability to run concurrent jobs slowed research results considerably • Persistent failures led to delays in publishing research results, including a catastrophic failure that caused loss of data S O LU T ION An end-to-end solution consisting of DDN SFA12KE and GRIDScaler, an embedded GPFS high-performance appliance, connected to WOS object storage and all managed by DirectMon R ES U LTS • NIBMG has achieved a significant reduction in data ingest time when compared to its legacy storage platform, enabling the team to run concurrent sequences for twice as many patients while translating results for broader research benefits in as little as two-to-three months • NIBMG identified new genes and biological pathways specific to oral cancer associated predominantly with smokeless tobacco consumption in India DDN.COM | 1.800.837.2298 THE NATIONAL INSTITUTE OF BIOMEDICAL GENOMICS (NIBMG), LOCATED IN KALYANI, WEST Bengal, India, is the first research institution in India devoted to genome-based research for human health and disease. For NIBMG researchers, an overarching goal is to conduct and promote cutting-edge research in biomedical genomics that will translate to improvements in genomics-based healthcare in India including standards, practices and treatments for illnesses and diseases that have a genetic component. This is particularly important as there is a large gap in the understanding of—and estimating the interaction between— genetics and environmental factors in the causes of diseases that are prevalent throughout India. For instance, NIBMG is conducting genomics-based research into oral cancer, under the aegis of International Cancer Genome Consortium (ICGC) to identify new genes and biological pathways that are specific to oral (gingivo-buccal) cancer predominantly associated with smokeless tobacco consumption in India. Oral cancer is the eighth most common cancer worldwide, but is the leading cancer among men in India. According to Dr. Nidhan K. Biswas, a computational biologist and Young Biotechnologist awardee of NIBMG’s ICGC-India project, discoveries from this project may lead to better therapies for oral cancer. “As a lead member of the ICGC project, we are collecting, analyzing, interpreting and managing enormous amounts of research data looking for correlations between genetics and oral cancer,” he explains. “We have already analyzed 110 paired blood and tumor tissue samples out of 500 study participants from India, which can exceed 1TB of storage per patient or 500TB of data altogether. Once processed and analyzed, the amount of storage required increases linearly with the increasing number of patients.” THE CHALLENGE For the ICGC study and other critical scientific research, NIMBG uses massively-parallel DNA sequencing technologies to process DNA sequence data in order to produce meaningful scientific interpretations and analyses. With its legacy block storage platform, the team encountered persistent performance and reliability problems as the amount of data processing overtaxed the system’s capabilities. Over time, it became increasingly difficult to ingest large datasets from next-gen sequencers and feed them to HighPerformance Compute (HPC) clusters fast enough. The team typically was restricted to running one job at a time because two concurrent jobs often resulted in a failure. “When stretched to its limits with large capacities, disks of the existing storage would fail, often several times a week,” says Bikram Roy, systems analyst for NIBMG. “It became hard to deliver seamless services to the researchers as our biggest problem was storage throughput and constant performance bottlenecks.” These bottlenecks led to delays in publishing the research results from various genomics studies. Finally, a catastrophic failure resulted in the loss of critical data, meaning critical steps in the analysis had to be redone, which prompted the team to seek a more reliable, high-performance, scalable storage platform that could handle escalating storage needs, while simplifying access and management across all parts of the data lifecycle. We needed a storage platform that could support our needs in two areas: first, ingesting data quickly into the research environment and secondly, simplifying our ability to analyze the data so that we could interpret and securely share both raw data and results for collaboration and publishing purposes. THE REQUIREMENTS In seeking a high-performance storage solution to meet its ever-increasing data demands, NIBMG issued a Request for Proposal (RFP) to replace its legacy system with a powerful primary platform encompassing a parallel file system, RAID storage, combination of SAS and SATA disks, 200TB of usable capacity, as well as NFS and CIFS support. The Institute also wanted a secondary solution with 100TB of usable capacity that could scale to 1PB to support both backup and archive requirements. An active archive, with the ability to seamlessly move data from primary to secondary storage, and back again when necessary, was critical, as well as cloud features such as multisite replication to support the increased need for scientific collaboration and erasure coding for low-overhead data protection. Maintaining the highest levels of availability was also critical, especially given NIBMG’s previous reliability problems with its legacy storage. NIBMG specified there could be no single point of failure in its primary or secondary storage; system components had to withstand failures without affecting data availability. “Backing up our primary storage onto a secondary system was important too,” adds Roy. “Object storage technology that would let us upload and retrieve data, seemed like a good fit for both our backup and archive requirements.” Fast and efficient performance was another major selection criterion as NIBMG forecasted a continuous surge of data as a result of its ICGC project and other ongoing research. With its legacy system, NIBMG couldn’t support the simultaneous write load from its DNA sequencers and the read/compute load from the HPC cluster operating on the sequencer output. “We needed the highest levels of performance to ensure simultaneous sequencer ingest and HPC research on thousands of large data sets,” Roy says. “Our ability to scale the environment seamlessly was another big consideration, as we would need to grow in place to support multiple petabytes of data as further research gets funded from outside resources.” As part of the cancer genome analysis working group, we actively collaborate with scientists from more than 20 countries and five continents using the consortium’s cloudbased storage. We want to distribute data through our own cloud-based archive in the future to make it easier to share ideas and results with different research groups around the world. Dr. Nidhan K. Biswas Biochemist and Young Biotechnologist Awardee at NIBMG DDN.COM | 1.800.837.2298 THE SOLUTION NIBMG conducted extensive evaluations of different storage technologies to determine the ideal platform for meeting current and evolving research requirements. After considering offerings from other manufacturers, the team purchased high-performance, scalable storage from DataDirect Networks (DDN), including the SFA12K® storage platform, GRIDScaler™ parallel file system and WOS® object storage platform. In evaluating DDN’s capabilities, NIBMG reached out to their collaborator, the Wellcome Trust Sanger Institute, a UK-based genomic research center. Based on positive feedback from the Wellcome Trust Sanger Institute, where DDN’s solutions have proven invaluable, NIBMG took a close look at DDN’s performance, scalability, reliability and flexibility before discerning it was the best fit for the institute’s stringent research demands. “No one could match DDN’s performance throughput,” says Roy. “The main reason we chose DDN was the ability to scale up or out in order to have high performance storage and fast ingest and analysis in the same platform.” BUSINESS BENEFITS • With DDN’s unprecedented throughput, NIBMG can process at least multiple jobs concurrently to speed analysis and translation of research results • The ability to run jobs for twice the number of patients enables NIBMG to deliver actionable research results faster, expediting the delivery of better treatments for oral cancer With DDN® Storage, NIBMG is much better equipped to handle data ingest, processing, archival and collaboration at a scale necessary to keep pace with research demands. “We felt that DDN was best suited to help us quickly and easily process a large amount of raw data to produce meaningful interpretations for further analysis of genetic mutations and cancer-causing changes in DNA,” adds Roy. “Dealing with thousands of large files requires a capable processing pipeline, along with unprecedented IOPS and bandwidth performance, all of which DDN provides.” THE BENEFITS Soon after deploying its DDN SFA12K platform and GRIDScaler appliance, NIBMG realized a multitude of benefits from its all-in-one scalable file storage solution. For example, the Institute was able to reduce the amount of time required to process data using DDN storage in conjunction with NIBMG’s multiple next-gen sequencers, including Illumina HiSeqs and LifeTech Ion Proton and Roche 454. Allowing simultaneous ingest and analysis on data from all these sequencers running around the clock, the storage system delivers ample capacity and performance to achieve a significant reduction in pipeline processing time when compared to their previous storage system. “By performing our computations on faster storage, we will be able to complete our data analysis in much less time,” notes Biswas. “With slower storage, it might take up to a year to process our project but now we can complete it within three-to-four months, which enables us to translate results for broader research benefits much faster and more effectively.” DDN’s powerful parallel file system works seamlessly with the institute’s mix of standard sequencing applications, such as BWA and GATK, and customized applications, like NIBMG’s variant caller and subsequent variant annotations. The results of flexible and powerful performance: NIBMG can now process multiple jobs concurrently without saturating system capacity. “With DDN technology, we’ve dramatically reduced the time it takes to perform a single job while running twice as many concurrent jobs,” Roy adds. “And we still have plenty of room to grow as needs dictate.” The beauty of WOS is its infinite scalability without an underlying file system which means that we can grow in place to support petabytes of vital research data as we continue to manage the ICGC and other projects. There are no overwrites and lot s of space for both backup and collaboration. Bikram Roy Systems Analyst at NIBMG DDN’s massively parallel I/O also gives NIBMG the opportunity to run concurrent sequencing analysis jobs for twice as many patients. “One of the biggest advantages of parallel processing is that we can run jobs for about 40 samples concurrently, which is twice what we could handle before, accelerating analysis for a larger number of patients,” says Biswas. Having the ability grow in place is an overarching attribute of DDN’s parallel file and object storage. With WOS technology, NIBMG will be able to easily store petabytes of unstructured data with the highest availability. “The beauty of WOS is its infinite scalability without an underlying file system which means that we can grow in place to support petabytes of vital research data as we continue to manage the ICGC and other projects,” Roy adds. “There are no overwrites and lots of space for both backup and collaboration.” Initially, NIBMG will rely on WOS storage primarily for backup purposes but plans to expand its use in the future as an active archive to support increased collaboration and publishing needs. “Now WOS is primarily serving as a backup for our primary storage,” he continues. “In the future, we’ll expand that to build an active archive of research analysis.” NIBMG also plans to take advantage of WOS technology for increased disaster recovery by replicating data at up to eight different locations. NIBMG forecasts lower total cost of ownership with its DDN storage, based on the ease with which the institute can expand capacity without having to replace systems, as well as customize the mix of drives to meet their price/performance needs. “With DDN’s modular approach, we can start out small based on funding and then upgrade easily, which is just fantastic,” adds Biswas. “The mix of SATA and SAS hard drives in a single chassis is a big benefit too.” DDN’s high-density platform helps NIBMG reduce data center costs. “We now support three times the storage capacity in half the rack size, which reduces costs and administrative headaches” Roy says. DDN.COM | 1.800.837.2298 TECHNICAL BENEFITS • DDN’s powerful parallel file system works seamlessly with the institute’s mix of standard and customized applications Looking ahead, NIBMG forecasts that DDN will play a big role in a variety of research projects, including the whole genome sequencing of the Indian population. “DDN’s scalable, reliable storage and integrated collaboration and archive for research communities are ideally suited for NIBMG,” concludes Biswas. “We have a lot of confidence that DDN will continue to help us manage explosive data growth while ensuring more actionable research results and faster cures for deadly cancers.” • A bility to grow in place and scale up to more than 10PB of aggregated capacity will keep pace with ever-increasing data demands • High availability and reliability with no single point of failure protects vital research results from inadvertent data loss • Versatile object storage is ideally suited for both backup and active-archive applications ABOUT DDN® DataDirect Networks (DDN) is the world’s leading big data storage supplier to data-intensive, global organizations. For more than 15 years, DDN has designed, developed, deployed and optimized systems, software and solutions that enable enterprises, service providers, universities and government agencies to generate more value and to accelerate time to insight from their data and information, on premise and in the cloud. Organizations leverage the power of DDN technology and the deep technical expertise of its team to capture, store, process, analyze, collaborate and distribute data, information and content at largest scale in the most efficient, reliable and cost effective manner. DDN customers include many of the world’s leading financial services firms and banks, healthcare and life science organizations, manufacturing and energy companies, government and research facilities, and web and cloud service providers. For more information, visit our website www.ddn.com or call 1-800-837-2298. [email protected] +1.800.837.2298 ©2015 DataDirect Networks. All Rights Reserved. DataDirect Networks, the DataDirect Networks logo, DDN, GRIDScaler, Web Object Scaler, & WOS are trademarks of DataDirect Networks. Other Names and Brands May Be Claimed as the Property of Others. v2 (2/15)