What type of study is this?

September 10, 2025

Uncovering the limits of visual-language models in engineering knowledge representation

Key Points

VL models struggle to effectively convey both functional and structural information in engineering design.
The dataset consists of 1.5 million text-image pairs drawn from patents to evaluate VL model performance.
Fine-tuning the CLIP base model revealed limitations in capturing details crucial for precision-driven engineering tasks.
Future research must address these challenges to enhance VL models for engineering design applications.

Abstract

ABSTRACT: Visual-Language (VL) models offer potential for advancing Engineering Design (ED) by integrating text and visuals from technical documents. We review VL applications across ED phases, highlighting three key challenges: (i) understanding how functional and structural information is complementarily expressed by text and images, (ii) creating large-scale multimodal design datasets and (iii) improving VL models’ ability to represent ED knowledge. A dataset of 1.5 million text-image pairs and an evaluation dataset for cross-modal information retrieval were developed using patents. By Fine-tuning and testing the CLIP base model on these datasets, we identified significant limitations in VL models’ capacity to capture fine-grained technical details required for precision-driven ED tasks. Based on these findings, we propose future research directions to advance VL models for ED applications.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Consoloni et al. (Fri,) studied this question.

www.synapsesocial.com/papers/68c1d5e554b1d3bfb60f87eb — DOI: https://doi.org/10.1017/pds.2025.10340

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Authors

Marco Consoloni

Vito Giordano

Federico A. Galatolo

Journals

Proceedings of the Design Society

Actions

Institutions

University of Pisa

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Uncovering the limits of visual-language models in engineering knowledge representation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion