What question did this study set out to answer?

This research aims to enhance predictive performance for ADMET properties using a refined chemical language model.

May 16, 2026

Improving predictive performance for molecular ADMET properties using a chemical language model

Key Points

This research aims to enhance predictive performance for ADMET properties using a refined chemical language model.
Fine-tuned a DeBERTa-based SMILES encoder on a 300 K PubChem–ADMET dataset.
Used a multi-output regression scheme with a Focal MAE objective for balanced learning across endpoints.
Introduced the ADMET-Balanced Performance Score (ABPS) for consistent performance assessment across tasks.
Achieved state-of-the-art performance on 12 endpoints and top-5 on 7 endpoints in TDC benchmark leaderboard.
DeBERTa maintained SMILES token-level syntax competence more stably than BERT- and RoBERTa-based encoders.
Showed improved performance on heterogeneous metrics across multiple ADMET endpoints.

Abstract

Abstract Molecular encoders play a central role in AI‐driven drug design by defining the chemical representations used for downstream prediction and decision‐making. However, many widely used encoders are not explicitly trained to internalize ADMET‐relevant structure–property relationships across diverse endpoints. In this study, we fine‐tuned a DeBERTa‐based SMILES (Simplified Molecular Input Line Entry System) encoder to internalize ADMET‐relevant structure–property information across 22 endpoints under a multi‐task learning (MTL) framework, while maintaining strong comprehension of SMILES syntax and structural regularities. Starting from a pretrained ZINC‐based DeBERTa checkpoint, we trained on a 300 K PubChem–ADMET dataset using a multi‐output regression scheme with a Focal MAE objective to mitigate optimization imbalance across heterogeneous endpoints. To consistently assess balanced performance across tasks with heterogeneous metrics, we further introduced the ADMET‐Balanced Performance Score (ABPS) as an integrated evaluation criterion. On the TDC benchmark leaderboard, our encoder achieved state‐of‐the‐art performance on 12 endpoints and top‐5 performance on 7 endpoints, indicating broad competitiveness across ADMET tasks. In addition, by restoring an MLM head and tracking MLM accuracy throughout fine‐tuning, we show that DeBERTa preserves SMILES token‐level syntax competence more stably than BERT‐ and RoBERTa‐based encoders under the same ADMET training conditions. Overall, this work establishes an ADMET‐aware SMILES encoder that balances broad‐spectrum endpoint learning with preservation of pretrained SMILES knowledge, and can serve as a foundation for downstream molecular modeling pipelines, including future multimodal drug‐design systems.

Bookmark

Cite This Study

Lim et al. (Thu,) studied this question.

synapsesocial.com/papers/6a080b17a487c87a6a40d307 https://doi.org/https://doi.org/10.1002/bkcs.70177

Bookmark