Value at a glance
- Ensuring the best HPC environment for bioinformatics pipeline
- Detailed evaluation of HPC options
- Expert recommendation
- Enabling informed decision making
- Recommending a platform for performance and scale
How The Server Labs helped Genomics England to optimise the cloud environments for their bioinformatics workloads
When Genomics England were planning a new bioinformatics pipeline solution, they wanted to ensure it ran in an optimal HPC environment. The Server Labs carried out a detailed analysis of every HPC option, so that Genomics England could be 100% sure they had the most efficient and cost effective platform.
Genomics England Limited (GEL) – owned by the UK Department of Health and Social Services - is the world’s largest community in genomic healthcare and medical research. Using its genomic datasets, GEL powers lifesciences research that results in life-transforming medicines, treatments and diagnostics.
The bioinformatics pipeline is crucial to GEL’s continued ground-breaking work. GEL had been using the ‘Bertha’ suite of tools to run bioinformatics code in production, but had selected a new ‘Genie’ system, designed to:
The Genie solution needed an optimal High Performance Computing (HPC) environment to run effectively and efficiently, and GEL wanted to evaluate the options before finalising their HPC set-up.
GEL wanted eliminate the issues they’d preivously had, by creating an HPC environment that could:
Additionally, they wanted to identify a Disaster Recovery setup for the existing pipelines whilst the new one is being set up.
GEL wanted eliminate the issues they’d preivously had, by creating an HPC environment that could:
Additionally, they wanted to identify a Disaster Recovery setup for the existing pipelines whilst the new one is being set up.
GEL had worked with The Server Labs (TSL) in the past and knew that their cloud expertise would be invaluable in helping them select the best environment. They asked TSL to evaluate the different options so that they could make a highly informed decision.
TSL tested the workloads on different compute environments, documented the results and provided GEL with observations and recommendations. TSL set up and ran four different architecture patterns for hybrid computing for the Genie workload:
• Pattern 1 - Synced storage
• Pattern 2 - Shared storage
• Pattern 3 - Single executor
• Pattern 4 - Pattern 2 with single orchestrator
TSL trialled the following technologies:
Compute Orchestration
1. AWS Batch
2. IBM LSF
Fast-tier / Persistent Storage
1. AWS FSx for Lustre
2. Weka on-prem
3. Weka on AWS
4. S3
TSL provided detailed, documented test outcomes: • CPU and elapsed timings for different scenarios • Costings for the different technology configurations TSL provided eight key recommendations, as well as areas for further exploration.
Key recommendations
CPU and elapsed timings for different scnarios
Costings for the different technology configurations
Based on TSL’s feedback and recommendations, GEL now has a clear understanding of how to provide the best compute environment for different Genie workloads, and the associated costs of each.
Based on TSL’s feedback and recommendations, GEL now has a clear understanding of how to provide the best compute environment for different Genie workloads, and the associated costs of each.
Selecting the right environment is essential to the performance of the bioinformatics pipeline workload. Testing every option means GEL now know they have the optimal HPC setup.
Chief Technnology Officer at The Server Labs
You can download the case study in PDF format from here
You can download the case study in PDF format from here