Zinc ions serve dual roles in proteins: as catalytic cofactors and as structural elements. Distinguishing these functional classes from sequence alone remains challenging, because both share similar coordination geometries. Here, we demonstrate that ESM-2 embeddings encode sufficient information to classify catalytic versus structural zinc sites with high accuracy. On 73 sequence-diverse zinc proteins, machine learning classifiers achieve ROC-AUC of 0.93-0.97, significantly outperforming a motif-based baseline (AUC = 0.759; p = 0.015). Attention analysis reveals that histidine ligands in catalytic sites attend 9.2-fold more strongly to second-shell carboxylate residues─the proton-shuttling machinery essential for catalysis─than to random positions, providing mechanistic interpretability. These findings suggest that evolutionary sequence patterns encode the extended hydrogen-bonding networks distinguishing catalytic from structural sites. This sequence-only approach complements structure-based methods for large-scale metalloproteome annotation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Karen Sargsyan (Fri,) studied this question.
synapsesocial.com/papers/69a3d8a7ec16d51705d2fb01 — DOI: https://doi.org/10.1021/acs.jcim.5c03142
Karen Sargsyan
Journal of Chemical Information and Modeling
Institute of Chemistry, Academia Sinica
Institute of Sociology, Academia Sinica
Building similarity graph...
Analyzing shared references across papers
Loading...