What question did this study set out to answer?

The research aims to create a standardized framework for optimizing feature representation and model selection in microbiome sequencing data.

February 6, 2026Open Access

Target-driven optimization of feature representation and model selection for microbiome sequencing data with ritme

Read Full Paperexternally

Key Points

The research aims to create a standardized framework for optimizing feature representation and model selection in microbiome sequencing data.
Developed the ritme software package for combined algorithm selection and hyperparameter optimization.
Systematically explored feature engineering methods: taxonomic aggregation, sparsity-aware selection, compositional transforms, and metadata enrichment.
Applied ritme to three real-world microbiome datasets to assess performance against existing pipelines.
Ritme outperforms original study pipelines and generic AutoML baselines in predictive tasks.
Provided insights into the impact of feature and model choices on predictive performance.
Demonstrated the effectiveness of combine approaches in enhancing model accuracy.

Abstract

Microbiome sequencing datasets are sparse, high-dimensional, compositional, and hierarchically structured. Predictive modelling from these data typically relies on ad hoc choices of feature representation, obscuring their impact on performance and biological interpretation. A standardized, compute-efficient framework is needed to jointly optimize microbial feature representation and model algorithms with transparent model evaluation. Here, we present ritme, an opensource software package implementing Combined Algorithm Selection and Hyperparameter Optimization tailored to microbial sequencing data. ritme systematically explores feature engineering methods — taxonomic aggregation, sparsity-aware selection, compositional transforms, and metadata enrichment — alongside diverse model classes using state-of-the-art optimizers and model trackers. Applied to three real-world use cases, ritme outperforms original study pipelines and generic AutoML baselines. It further provides users with insights into how feature and model choices drive predictive performance. Together, these results establish ritme as a standardized framework for identifying optimal feature-model combinations from high-throughput sequencing data. ritme is an open-source Python package available at https://github.com/adamovanja/ritme.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

A. K. Adamov

ETH Zurich

Christian L. Müller

Nicholas A. Bokulich

ETH Zurich

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Target-driven optimization of feature representation and model selection for microbiome sequencing data with ritme

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study