Background: Drug–drug interactions (DDIs) account for about 30% of adverse drug reactions and 5–10% of hospital deaths. Combination therapy increases DDI risks, yet extracting DDIs from biomedical text remains challenging: existing methods rely on surface co-occurrence and fail when multiple drugs and interactions coexist in a sentence. Prior multimodal approaches simply concatenate text, molecular, or knowledge features without deep alignment, leading to misclassification of structurally similar but non-interacting drug pairs. Methods: We propose MultiMod-DDI, a framework that constructs a ternary evidence chain of “molecular structure–biological entities–DDI text”. Unlike existing work, MultiMod-DDI introduces (1) PS-AEGNN, a molecular graph network with ProbSparse self-attention to capture long-range chemical dependencies; (2) an adaptive position interaction vector that dynamically weights distant semantic links between drug entities; and (3) a multi-stage adaptive fusion module that sequentially applies subgraph-molecule attention and text-guided gating. These components are co-designed to enforce structured semantic alignment among heterogeneous modalities, effectively addressing the specific challenge of matching drug pairs to their correct interaction types in complex, multi-drug sentences. Results: On SemEval-2013 Task 9, MultiMod-DDI achieves 85.57% F1macro and 85.20% F1micro, outperforming state-of-the-art models. Conclusions: Through multimodal deep semantic alignment, MultiMod-DDI effectively resolves the mismatch between drug pairs and their interaction types in complex biomedical texts. The integration of multimodal features greatly improves DDI extraction accuracy, offering a reliable method for intelligent DDI mining from biomedical literature.
Yang et al. (Fri,) studied this question.