Data stewardship in microbial genomics research: the hidden complexity of reference strain variability

Angharad Green (University College London, UK)

10:45 - 11:00 Tuesday 14 April Morning

+ Add to Calendar

Abstract

The sequencing of bacterial genomes has been crucial in advancing our understanding of infectious disease pathogenesis and antimicrobial resistance (AMR). Whole genome sequencing (WGS) enables detailed characterisation of bacterial strains. Pseudomonas aeruginosa PAO1 was the first large bacterial genome to be fully sequenced, offering valuable insights into the genetic diversity and functional characteristics of this important human pathogen. The success of the PAO1 sequencing project led to the creation of the open-source Pseudomonas Genome Database, which remains a valuable resource of high-quality, annotated genomes for the research community. However, the use of reference strains in microbial genomics, such as PAO1, introduces hidden complexities that can affect downstream analyses. Genetic and phenotypic variations in reference strains used across research groups, are driven by laboratory adaptation, excessive subculturing, contamination, differences in strain maintenance, and inconsistent provenance. This poses significant challenges for reproducibility, data comparability, and accurate interpretation of genomic analyses. This work emphasises the essential role of responsible data stewardship throughout the microbial genomics research lifecycle, with a particular focus on the complexities introduced by reference strain variability. It outlines best practices for applying FAIR Principles to bacterial WGS, including: * DNA extraction and sequencing methodologies * Genome assembly and annotation standards * Laboratory handling and documentation of reference strains * Bioinformatics tools and analysis platforms * Metadata collection and curation * Long-term data preservation and sharing via suitable repositories The outcomes of this work aim to support the microbial research community in maximising the reliability, transparency, and value of WGS data for investigating infectious diseases and AMR.

More sessions on Registration