What question did this study set out to answer?

The aim is to enhance the extraction of drug-drug interactions (DDIs) from complex biomedical texts using a multimodal framework.

May 31, 2026Open Access

Multimodal Collaborative Modeling of Molecular Structures and Biomedical Text for Accurate Drug–Drug Interaction Extraction

Key Points

The aim is to enhance the extraction of drug-drug interactions (DDIs) from complex biomedical texts using a multimodal framework.
Developed MultiMod-DDI framework with a ternary evidence chain of molecular structure, biological entities, and DDI text.
Utilized PS-AEGNN molecular graph network and adaptive position interaction vectors for capturing dependencies.
Implemented a multi-stage adaptive fusion module for addressing semantic alignment among different modalities.
Achieved 85.57% F1macro and 85.20% F1micro on SemEval-2013 Task 9, outperforming previous models.
Effectively resolved mismatches between drug pairs and interaction types in complex sentences.
Demonstrated significant improvement in DDI extraction accuracy through multimodal feature integration.

Abstract

Background: Drug–drug interactions (DDIs) account for about 30% of adverse drug reactions and 5–10% of hospital deaths. Combination therapy increases DDI risks, yet extracting DDIs from biomedical text remains challenging: existing methods rely on surface co-occurrence and fail when multiple drugs and interactions coexist in a sentence. Prior multimodal approaches simply concatenate text, molecular, or knowledge features without deep alignment, leading to misclassification of structurally similar but non-interacting drug pairs. Methods: We propose MultiMod-DDI, a framework that constructs a ternary evidence chain of “molecular structure–biological entities–DDI text”. Unlike existing work, MultiMod-DDI introduces (1) PS-AEGNN, a molecular graph network with ProbSparse self-attention to capture long-range chemical dependencies; (2) an adaptive position interaction vector that dynamically weights distant semantic links between drug entities; and (3) a multi-stage adaptive fusion module that sequentially applies subgraph-molecule attention and text-guided gating. These components are co-designed to enforce structured semantic alignment among heterogeneous modalities, effectively addressing the specific challenge of matching drug pairs to their correct interaction types in complex, multi-drug sentences. Results: On SemEval-2013 Task 9, MultiMod-DDI achieves 85.57% F1macro and 85.20% F1micro, outperforming state-of-the-art models. Conclusions: Through multimodal deep semantic alignment, MultiMod-DDI effectively resolves the mismatch between drug pairs and their interaction types in complex biomedical texts. The integration of multimodal features greatly improves DDI extraction accuracy, offering a reliable method for intelligent DDI mining from biomedical literature.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper