ABSTRACT: Visual-Language (VL) models offer potential for advancing Engineering Design (ED) by integrating text and visuals from technical documents. We review VL applications across ED phases, highlighting three key challenges: (i) understanding how functional and structural information is complementarily expressed by text and images, (ii) creating large-scale multimodal design datasets and (iii) improving VL models’ ability to represent ED knowledge. A dataset of 1.5 million text-image pairs and an evaluation dataset for cross-modal information retrieval were developed using patents. By Fine-tuning and testing the CLIP base model on these datasets, we identified significant limitations in VL models’ capacity to capture fine-grained technical details required for precision-driven ED tasks. Based on these findings, we propose future research directions to advance VL models for ED applications.
Building similarity graph...
Analyzing shared references across papers
Loading...
Consoloni et al. (Fri,) studied this question.
www.synapsesocial.com/papers/68c1d5e554b1d3bfb60f87eb — DOI: https://doi.org/10.1017/pds.2025.10340
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Marco Consoloni
Vito Giordano
Federico A. Galatolo
Proceedings of the Design Society
University of Pisa
Building similarity graph...
Analyzing shared references across papers
Loading...