What type of study is this?

This is a Validation Study study.

What question did this study set out to answer?

This research aims to evaluate the perceptual reliability of latent attributes in diffusion-based fashion generation models.

May 7, 2026Open Access

Evaluating Perceptual Reliability of Latent Attribute Control in Diffusion-Based Fashion Generation

Key Points

This research aims to evaluate the perceptual reliability of latent attributes in diffusion-based fashion generation models.
Proposed a three-layer evaluation framework linking latent space geometry, semantic embedding space, and human perception.
Validated latent attribute directions using geometric quality-control metrics measuring linearity and centrality.
Examined semantic consistency through directional projection in CLIP embedding space.
Conducted a two-alternative forced-choice experiment with participants to estimate perceptual strength.
Fit shows strong cross-layer alignment in evaluations.
Pattern scale demonstrates semantic and perceptual ambiguity.
Findings indicate that perceptual reliability is attribute-dependent and that semantic metrics are insufficient without human evaluation.

Abstract

Although diffusion-based image generation models enable high-quality synthesis of fashion images, the reliable control of perceptual attributes in these models remains poorly understood. Current evaluation approaches primarily rely on semantic similarity metrics, such as CLIP scores, which may not accurately reflect human perceptual judgments. This study proposes a three-layer evaluation framework linking latent space geometry, semantic embedding space, and human perception. First, latent attribute directions are validated using geometric quality-control metrics measuring linearity and centrality. Second, semantic consistency is examined through directional projection in CLIP embedding space. Third, a two-alternative forced-choice experiment is conducted with 37 participants, and perceptual strength is estimated using a Bradley-Terry preference model. Experiments cover gender and garment conditions for four fashion attributes: fit, lightness, glossiness, and pattern scale. Results reveal that fit exhibits strong cross-layer alignment, while pattern scale shows semantic and perceptual ambiguity. The findings highlight that perceptual reliability in controllable generation is attribute-dependent and that semantic metrics alone cannot fully replace human evaluation.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Noriaki Kuwahara

Shintaro Kawanami

Takashi Sato

Journals

International Journal of Advanced Computer Science and Applications

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Evaluating Perceptual Reliability of Latent Attribute Control in Diffusion-Based Fashion Generation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study