What question did this study set out to answer?

This research aims to evaluate color recognition accuracy in vision-language models and understand its impact on AI-generated designs.

March 24, 2026Open Access

AI Blue: Systematic Color Recognition Bias in Vision-Language Models and Its Implications for AI-Generated UI Design

Key Points

This research aims to evaluate color recognition accuracy in vision-language models and understand its impact on AI-generated designs.
Conducted a systematic evaluation of four vision-language models
Analyzed color recognition using 40 colors from the HSL color space
Measured accuracy with CIEDE2000 metric
Collected 480 total observations for analysis
Commercial models achieved a mean ΔE00 of 2.51-3.33
LLaVA 7B exhibited a significantly higher error with ΔE00 = 24.63
Models demonstrated better performance on primary colors than intermediate hues
95.4% of AI-generated UI pixels were in the blue-purple range

Abstract

We present a systematic evaluation of color recognition accuracy across four Vision-Language Models (GPT-4o, Claude 3.5 Sonnet, Claude Sonnet 4, LLaVA 7B) using 40 colors from the HSL color space with 480 total observations, measured by CIEDE2000. Commercial models achieve mean ΔE00 of 2.51-3.33, while LLaVA 7B shows dramatically higher error (ΔE00 = 24.63). All models perform better on primary colors than intermediate hues. 95.4% of AI-generated UI pixels fall in the blue-purple range, connecting VLM color biases to the "AI Slop" phenomenon.

AI Blue: Systematic Color Recognition Bias in Vision-Language Models and Its Implications for AI-Generated UI Design

Key Points

Abstract

Cite This Study