Dhouib et al. (DeepPlantAllergy: deep learning for explainable prediction of allergenicity in plant proteins. Brief Bioinform 2025;26:bbaf605.) developed DeepPlantAllergy, a deep learning model for predicting allergenicity in plant proteins, reporting area under the receiver operating characteristic curve (ROC-AUC) ≈ 97.7-97.8% on an independent test set. However, the dataset construction may lead to optimistic performance estimates. Specifically, non-allergen sequences sharing >20% identity with allergens were removed before the train/test split, which can reduce the presence of "hard negatives" (moderately similar non-allergens) in the test set and thereby weaken assessment under realistic screening conditions. Because practical allergen screening requires discrimination against large numbers of non-allergens that may share moderate sequence identity, we suggest re-evaluating the model using test sets that retain challenging negatives (with filtering performed against training allergens only) and reporting precision-recall metrics (area under the precision-recall curve) alongside ROC-AUC to better reflect performance under class imbalance.
Building similarity graph...
Analyzing shared references across papers
Loading...
Kazim Okan Dolu
Briefings in Bioinformatics
İstanbul Kanuni Sultan Süleyman Eğitim ve Araştırma Hastanesi
Building similarity graph...
Analyzing shared references across papers
Loading...
Kazim Okan Dolu (Sun,) studied this question.
www.synapsesocial.com/papers/69d896a46c1944d70ce0824a — DOI: https://doi.org/10.1093/bib/bbag149