Machine learning and statistical inference in microbial population genomics.

Sheppard SK., Arning N., Eyre DW., Wilson DJ.

The availability of large genome datasets has changed the microbiology research landscape. Analyzing such data requires computationally demanding analyses, and new approaches have come from different data analysis philosophies. Machine learning and statistical inference have overlapping knowledge discovery aims and approaches. However, machine learning focuses on optimizing prediction, whereas statistical inference focuses on understanding the processes relating variables. In this review, we outline the different aspirations, precepts, and resulting methodologies, with examples from microbial genomics. Emphasizing complementarity, we argue that the combination and synthesis of machine learning and statistics has potential for pathogen research in the big data era.

DOI

10.1186/s13059-025-03775-4

Type

Journal article

Publication Date

2025-09-27T00:00:00+00:00

Volume

26

Keywords

Machine Learning, Microbiota, Genome, Microbial, Datasets as Topic, Genomics, Genome-Wide Association Study, Drug Resistance, Microbial, Virulence, Sequence Analysis, DNA

Permalink More information Close