Objectives Domain shift—the mismatch between training and deployment distributions—degrades deep learning at inference. Vision–language foundation models (VLMs) like ChatGPT-4o generalize out-of-distribution and enable controllable image-to-image synthesis, but direct clinical use is costly and, when closed-source, raises privacy concerns. We instead use a VLM to guide synthetic data generation, strengthening smaller deployable detectors without task-specific retraining—a controllable substitute for generative adversarial network (GAN)/diffusion augmentation, which typically requires large domain-specific datasets and unstable training. We use panoramic radiographs to compare a prompt-guided approach with an expert-guided classical image-processing baseline under morphology/size shift. Methods A dataset of 196 panoramic radiographs (145 annotated, 51 healthy) was used. We implemented two transparent augmentation pipelines: (i) Expert-Guided Synthesis (EGS), where clinicians delineated lesion masks that were procedurally rendered (noise-based texture, intensity modulation, edge blending); and (ii) Prompt-Guided Lesion Synthesis (PGLS), where clinicians specified lesion attributes via text prompts (size, shape, margin sharpness, contrast, location) and ChatGPT-4o produced parameterized image-to-image edits. A You Only Look Once (YOLO)10 detector was trained under three regimes (Real-only, Real+EGS, Real+PGLS) and evaluated with five-fold cross-validation and size-stratified reporting (small/medium/large). Results Baseline (Real-only): mean Average Precision (mAP)@0.5 0.47, mAP@0.5:0.95 0.22, Recall 0.47. Size-stratified baselines: small—0.30/0.085/0.27; medium—0.58/0.26/0.50; large—0.46/0.20/0.49. Both synthetic strategies improved robustness; PGLS consistently exceeded EGS. Overall, EGS 0.50/0.25/0.52 (+6.4%/+13.6%/+10.6%), PGLS 0.51/0.26/0.53 (+8.5%/+18.2%/+12.8%). Largest gains were in small and large: small 0.36/0.120/0.33 (EGS) vs . 0.39/0.130/0.35 (PGLS) = +20.0%/+41.2%/+22.2% and +30.0%/+52.9%/+29.6%; large 0.51/0.235/0.56 vs . 0.52/0.240/0.57 = +10.9%/+17.5%/+14.3% and +13.0%/+20.0%/+16.3%. Conclusions Both prompt-guided and expert-guided synthesis improved resilience to morphology shift. PGLS yielded greater gains, reflecting flexible natural-language control while avoiding GAN/diffusion data and stability burdens. Clinically, higher small-lesion recall lowers missed early apical periodontitis, and higher mAP@0.5:0.95 tightens localization, curbing false positives and unnecessary follow-ups. Because PGLS uses auditable prompts/edits, it extends to other shifts ( e.g ., device or artifact) and strengthens smaller, deployable detectors for more consistent accuracy across sites.
Building similarity graph...
Analyzing shared references across papers
Loading...
Semanur Özüdoğru
Fikret Özgür Coşkun
Fatih Uysal
PeerJ Computer Science
Gazi University
Istanbul Medeniyet University
Kafkas University
Building similarity graph...
Analyzing shared references across papers
Loading...
Özüdoğru et al. (Wed,) studied this question.
www.synapsesocial.com/papers/698586498f7c464f2300a4da — DOI: https://doi.org/10.7717/peerj-cs.3577