Image of rows of ones and zeros

Modernising disease research data processing in the cloud

How The Server Labs helped The Oxford Big Data Institue to transform the way they process the data for highly complex research simulations.

When Oxford University’s Big Data Institute needed to speed up the processing of highly complex research simulations, and give their users more control, they knew that a move to the AWS cloud was the answer. The Server Labs helped them achieve their goals.

Value at a glance

  • Migration from private to public (AWS) cloud
  • Optimised pipeline
  • Significant improvement in processing times
  • Users able to manage their own processing runs
Image of a user pointing to graphical representation of a network on a transparent glass screen

Disease research at the Big Data Institute

The need for change

The Oxford Big Data Institute (BDI), part of the University of Oxford, analyses large, complex, heterogeneous data sets to research causes, consequences, prevention and treatment of disease. Their work helps to identify associations between lifestyle exposures, genetic variants, infections and health outcomes around the globe.

The BDI is funded by UK Research and Innovation, the Medical Research Council, the British Heart Foundation, and the Robertson Foundation, and collaborates closely with the NHS, including the Academic Health Science Centre and the Biomedical Research Centre.

The Oxford Big Data Institute (BDI) had developed a suite of tools and websites aimed at monitoring and displaying the prevalence of neglected tropical diseases (NTD) in Africa. These ran on a private cloud, but with extended processing times and other operational restrictions, the BDI knew that a change was needed.

Key issues

The key issues in using a private cloud were:

  1. Long processing times: the primary reason was the absence of weightings per IU in the existing pipeline, which prevented optimisation and led to the weighting process taking two weeks.
  2. Lack of user control: users were unable to define, execute, and monitor the NTD simulations.
  3. Lack of modernisation: the private cloud did not have the up-to-date functionality of a public cloud - for example, files were text-based, rather than using a format such as Parquet, and the scheduler was Slurm, with each job independently running on the nodes of the NTD Cluster.

Oxford BDI’s objective was to modernise its processing chain by leveraging the public cloud. So they engaged expert cloud architects The Server Labs to help them achieve their goal.

Image of researcher in a lab looking into a microscope

The solution

The Server Labs used their expertise to review and redesign the backend and architecture, incorporating valuable insights into technology selection and the product technology roadmap and deploying the solution into production.

The TSL team of architects and engineers carried out the following steps in creating the BDI solution:


1 ‘Lift and Shift’ the existing pipeline to the Amazon Web Services (AWS) public cloud without significant architectural changes. AWS was selected for its advanced features to support High Performance Computing (HPC)/Big Data processing.

2 Developed an API for data outputs, using API Gateway/Lambda. The API caters to the needs of all end-users and is consumable by front-end applications.

3 Integrated Slurm with Amazon Batch.


4 Deployed AWS ParallelCluster for seamless job submissions. ParallelCluster simplifies the deployment and management of HPC clusters on AWS. It uses infrastructure-as-code principles to automate and secure the provisioning of resources, and supports a number of job schedulers, including AWS Batch, SGE, Torque, and Slurm.

5 Selected Network File System (NFS) for file sharing, aligning with the Warwick setup.

6 Introduced cloud-native features, storing intermediate states and implementing a comprehensive DevOps pipeline.

7 Analysis and reporting.

The outcomes

As a result of the work carried out by The Server Labs, the user experience for BDI researchers has been transformed. By establishing the pipeline in a public cloud with an accessible API for data outputs, BDI now has:

  • Enhanced pipeline performance.
  • A modern, public cloud platform with wider range of
    functionality.
  • Users empowered to initiate processing runs and monitor
    their status. They can define, execute, and monitor their
    NTD simulations.
  • An API that meets the needs of end users.
Get in touch
Image of business woman using a tablet

This was a very rewarding project, being able to make such an enormous difference to the performance of this vital workload.

Paul Parsons

Chief Technnology Officer at The Server Labs

Download the Case Study

You can download the case study in PDF format from here

Download PDF