Abstract Relating billions of proteins across the tree of life remains a challenging task for comparative biosphere genomics and artificial intelligence-driven structure prediction. Here we present DIAMOND DeepClust, a cascaded, ultra-fast clustering method enabling planetary-scale organization of protein space, scaling to trillions of sequences while retaining sensitivity at low identity. Aggregating 19 billion biosphere proteins into 544 million nonsingleton clusters, we show that using our DeepClust database, available for download, can enhance structure prediction with AlphaFold2.
Building similarity graph...
Analyzing shared references across papers
Loading...
Benjamin Buchfink
Emile Barbe
Haim Ashkenazy
Nature Methods
University of Dundee
Max Planck Institute for Biology
Max Planck Computing and Data Facility
Building similarity graph...
Analyzing shared references across papers
Loading...
Buchfink et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69c4ccaffdc3bde4489181fa — DOI: https://doi.org/10.1038/s41592-026-03030-z