Microbiome sequencing datasets are sparse, high-dimensional, compositional, and hierarchically structured. Predictive modelling from these data typically relies on ad hoc choices of feature representation, obscuring their impact on performance and biological interpretation. A standardized, compute-efficient framework is needed to jointly optimize microbial feature representation and model algorithms with transparent model evaluation. Here, we present ritme, an opensource software package implementing Combined Algorithm Selection and Hyperparameter Optimization tailored to microbial sequencing data. ritme systematically explores feature engineering methods — taxonomic aggregation, sparsity-aware selection, compositional transforms, and metadata enrichment — alongside diverse model classes using state-of-the-art optimizers and model trackers. Applied to three real-world use cases, ritme outperforms original study pipelines and generic AutoML baselines. It further provides users with insights into how feature and model choices drive predictive performance. Together, these results establish ritme as a standardized framework for identifying optimal feature-model combinations from high-throughput sequencing data. ritme is an open-source Python package available at https://github.com/adamovanja/ritme.
Building similarity graph...
Analyzing shared references across papers
Loading...
A. K. Adamov
ETH Zurich
Christian L. Müller
Nicholas A. Bokulich
ETH Zurich
Building similarity graph...
Analyzing shared references across papers
Loading...
Adamov et al. (Tue,) studied this question.
synapsesocial.com/papers/698586498f7c464f2300a44f — DOI: https://doi.org/10.3929/ethz-c-000792998