Abstract Molecular encoders play a central role in AI‐driven drug design by defining the chemical representations used for downstream prediction and decision‐making. However, many widely used encoders are not explicitly trained to internalize ADMET‐relevant structure–property relationships across diverse endpoints. In this study, we fine‐tuned a DeBERTa‐based SMILES (Simplified Molecular Input Line Entry System) encoder to internalize ADMET‐relevant structure–property information across 22 endpoints under a multi‐task learning (MTL) framework, while maintaining strong comprehension of SMILES syntax and structural regularities. Starting from a pretrained ZINC‐based DeBERTa checkpoint, we trained on a 300 K PubChem–ADMET dataset using a multi‐output regression scheme with a Focal MAE objective to mitigate optimization imbalance across heterogeneous endpoints. To consistently assess balanced performance across tasks with heterogeneous metrics, we further introduced the ADMET‐Balanced Performance Score (ABPS) as an integrated evaluation criterion. On the TDC benchmark leaderboard, our encoder achieved state‐of‐the‐art performance on 12 endpoints and top‐5 performance on 7 endpoints, indicating broad competitiveness across ADMET tasks. In addition, by restoring an MLM head and tracking MLM accuracy throughout fine‐tuning, we show that DeBERTa preserves SMILES token‐level syntax competence more stably than BERT‐ and RoBERTa‐based encoders under the same ADMET training conditions. Overall, this work establishes an ADMET‐aware SMILES encoder that balances broad‐spectrum endpoint learning with preservation of pretrained SMILES knowledge, and can serve as a foundation for downstream molecular modeling pipelines, including future multimodal drug‐design systems.
Lim et al. (Thu,) studied this question.