The optimal growth temperature (OGT) of organisms is valuable in bioprospecting enzymes that work under extreme conditions. Existing OGT prediction models achieve high accuracy but mainly capture trends of overrepresented groups in the training set including organisms that thrive at moderate temperatures and those from well-described taxa. In this study, we incorporated weighted scoring and phylogenetic splits to improve the generalizability of the prediction models. We first built a new growth temperature data set comprising more than 15,000 species distributed over all three domains of life, with special attention to include OGT and extreme temperature data. We then trained machine learning models on the prokaryotic OGT data using proteome-averaged amino acid descriptors. The best-performing model was the multilayer perceptron (MLP) with a test RMSE of 5.49 °C and an R2 of 0.84. The most important proteome features were related to backbone flexibility and charged residues, as well as surface accessibility. The MLP model is integrated in the command line tool OGTFinder and available under MIT license at: https://github.com/SC-Git1/OGTFinder.
Building similarity graph...
Analyzing shared references across papers
Loading...
Colette et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69df2b65e4eeef8a2a6b0632 — DOI: https://doi.org/10.1021/acs.jcim.5c03033
Sophie Colette
Jaldert François
Bart De Moor
Journal of Chemical Information and Modeling
KU Leuven
Leiden University
Center for Systems Biology
Building similarity graph...
Analyzing shared references across papers
Loading...