The 3D structure of a protein determines its function. Despite the UniProt knowledge base surpassing 200 million entries, only 0.2% has experimentally determined functions. This suggests that many valuable proteins, potentially catalyzing novel enzymatic reactions, remain undiscovered among the vast number of function-unknown proteins. The rise of deep learning models like AlphaFold2 has provided access to predicted 3D structures. We have advanced research to predict protein functions using structural information. We developed FUJISAN, a LightGBM-based machine learning tool that predicts if a pair of enzymes catalyze the same reaction on the same substrate based on sequence and structural similarities. We also introduce the development of a flexible discrimination method using deep learning utilizing the attention mechanism and the existing pre-trained protein large language model (ESM C). We demonstrate the performance of these methods in functional prediction of proteins annotated in the MIBiG database, which catalogs biosynthetic gene clusters involved in secondary metabolite production. Furthermore, we discuss how these functionally predicted proteins of unknown function were classified.
Building similarity graph...
Analyzing shared references across papers
Loading...
Biophysical Journal
Add This Paper to Your Research Feed
Any time a new paper drops it will be there.
Fujita et al. (Sun,) studied this question.