This study proposes a RoI-based, data-efficient fine-grained Aids to Navigation (AtoN) classification method using vision– language models (VLMs) for Maritime Autonomous Surface Ship (MASS). The reliability of the Electronic Chart Display and Information System (ECDIS) can be limited by operating anomalies and discrepancies between charted and actual environments, motivating camera-based situational awareness to support human watch-keeping. AtoNs, which are crucial indicators for coastal navigation, are typically observed as a distant and small-scale objects, making large-scale labeled data collection difficult and degrading full-frame classification due to background dominance. To address this, we focus on RoI-based classification under limited supervision and compare a supervised YOLOv12 classifier baseline with CLIP (Contrastive Language–Image Pre-training). CLIP maximizes data efficiency through domain-specific prompt engineering grounded in IALA Region B attributes and LoRA-based few-shot tuning. Experiments on Virtual RobotX (VRX) simulation datasets under clear and foggy conditions and on real-sea RoI images demonstrate that the proposed VLM-based classifier achieves robust performance with limited training samples and maintains higher robustness under degraded visibility. These results suggest an effective direction for practical, data-efficient AtoN classification in maritime environments via RoI-based preprocessing and parameter-efficient VLM adaptation.
Building similarity graph...
Analyzing shared references across papers
Loading...
S.B. Im
Si-Won Kim
Seonghyeon Jung
Journal of the Society of Naval Architects of Korea
Building similarity graph...
Analyzing shared references across papers
Loading...
Im et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69d895a86c1944d70ce06b0a — DOI: https://doi.org/10.3744/snak.2026.63.2.101