What does this research mean for the field?

The Structural Similarity Index Measure (SSIM) exhibits systematic biases that create 'BlindSpots' where severe structural degradation in AI-generated imagery remains undetected, with 95.30% of hidden failures occurring in regions where SSIM scores are misleadingly high. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

What question did this study set out to answer?

The aim is to identify limitations of SSIM in AI imagery evaluation through the Tier 3 diagnostic framework.

March 13, 2026Open Access

Tier3 v1.2: Structural Failure Theory of SSIM — Visible and Hidden Collapse across Four AI Architectures: A Large-Scale Diagnostic Analysis of 328 Million Block-Level Data Points

Key Points

The aim is to identify limitations of SSIM in AI imagery evaluation through the Tier 3 diagnostic framework.
Analyzed 328 million data points from NIH ChestX-ray8 across four AI architectures.
Utilized theory, statistics, and visualization to prove the existence of 'BlindSpots'.
Employed a real-time 3D visualization system to depict collapse structures in SSIM.
Conducted analysis under eight configurations using two processing paths.
Identified two types of collapse: Visible Collapse in low-SSIM areas and Hidden Collapse despite high SSIM scores.
95.30% of Hidden Collapse failures concentrated in 'Silent' regions where SSIM becomes unresponsive.
Established a new deterministic basis for AI safety evaluation beyond traditional metrics.

Abstract

Abstract: This study formalizes the Tier 3 diagnostic framework to reveal the fundamental limitations of the Structural Similarity Index Measure (SSIM) in modern AI-generated imagery. While SSIM is a standard perceptual metric, it possesses two systematic biases—Luminance Instability and Structural Masking—which create "BlindSpots" where severe structural degradation remains undetected. This work is the first to prove the existence of these BlindSpots through a tripartite approach: theory, statistics, and visualization. Utilizing a massive dataset of 328 million blocks extracted from NIH ChestX-ray8, we compared four architectures: Upscayl (High/Standard/Digital) and SwinIR. Our analysis reveals a distinct two-layer collapse structure based on SSIM’s internal response: Visible Collapse: Observable degradation in low-SSIM regions. Hidden Collapse: Severe internal degradation occurring even when high scores (SSIM p90) are maintained. Notably, in SwinIR, 95. 30% of Hidden Persistent failures concentrate in "Silent" regions where SSIM becomes mathematically unresponsive—a failure mode invisible to conventional scalar metrics. Furthermore, these multidimensional states are demonstrated via a real-time 3D visualization system (Shell–Membrane–Yolk model), revealing that Hidden Collapse converges into two specific attractors (None/Single). This work establishes a new deterministic foundation for AI safety evaluation, moving beyond traditional perceptual metrics toward a structural understanding of image collapse. Raw Data Provenance and Reproducibility (v1. 1 Update) This project conducts a comprehensive analysis across four different AI models, each evaluated under two processing paths—Notta (Standard) and TTA (Test Time Augmentation) —resulting in eight total configurations. Across these configurations, the system processes approximately 82 million blocks per path, yielding roughly 650 million data points overall. To protect unpublished findings currently being prepared for an academic paper, the raw CSV data files themselves remain unpublished at this stage. These files will be released in this repository following the publication of the associated research paper. In this version (v1. 1), a complete SHA256 hash list of all raw output files (rawcsvₛha256. zip) is provided to establish priority and mathematically guarantee data integrity. These hashes uniquely identify the foundational AI output data prior to any statistical processing, ensuring that the raw data released in the future will be exactly identical to the data used in this research, regardless of DuckDB versions, import settings, or other software configurations. Furthermore, the source code for the engine used to generate this dataset is already publicly available through the related Zenodo record (17677441) and on GitHub, ensuring full transparency and reproducibility of the data generation process. Note on TerminologyIn this dataset and documentation, the term “Original” is used interchangeably with “Ground Truth (GT) ”, referring to the high‑quality source images used as the baseline for analysis. Dataset and Mandatory Citation: - Source: NIH ChestX-ray8 (Hospital-scale chest x-ray database) - Citation: Wang, X. , et al. (2017). "ChestX-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thoracic diseases. " Proceedings of the IEEE CVPR, 3462–3471. - Download: https: //nihcc. app. box. com/v/ChestXray-NIHCC Software and Tools UsedThe dataset was generated and processed using the following specialized tools and AI models: - Source Data Generation Tool — Custom application developed by the author for block‑level diagnostic CSV generationhttps: //zenodo. org/records/17677441- Upscayl — Primary GUI/engine for image upscalingLicense: GNU AGPLv3- Real‑ESRGAN / SwinIR — Underlying AI architectures used for image restoration and enhancementLicenses: Real‑ESRGAN (Apache License 2. 0) / SwinIR (Apache License 2. 0) Credits & Citations- Data Generation Tool — Developed by the author- Upscayl — Developed by Nayam Amarshe and TGS963- SwinIR — Liang, J. , et al. “SwinIR: Image Restoration Using Swin Transformer. ” arXiv: 2108. 10257 (2021) - Real‑ESRGAN — Wang, X. , et al. “Real‑ESRGAN: Training Real‑World Blind Iterative Image Restoration. ” ICCV Workshops, 2021 (https: //github. com/xinntao/Real-ESRGAN) 要約: 本研究では、現代の AI 生成画像評価における構造類似性指標 (SSIM) の根本的限界を解明するため、Tier 3 診断フレームワークを定式化した。SSIM は知覚指標として広く利用されているが、その数学的構造には輝度の不安定性と構造的マスキングという二つの系統的バイアスが存在し、深刻な構造劣化を検知できない「BlindSpot (盲点) 」を形成する。本研究は、この BlindSpot の存在を世界で初めて理論・統計・可視化の三側面から証明した。 NIH ChestX-ray8 から抽出した総計 3 億 2, 800 万ブロック (1 モデルあたり約 8, 200 万) を用い、Upscayl (High / Standard / Digital) および SwinIR の 4 アーキテクチャを比較した。その結果、SSIM の内部応答に基づく二層構造の崩壊を発見した。 Visible Collapse: 低 SSIM 領域で観測される可視的な崩壊 Hidden Collapse: SSIM が上位 10% (SSIM ≥ p90) を維持しているにもかかわらず発生する深刻な内部崩壊特に SwinIR では、Hidden Collapse の Persistent の 95. 30% が、SSIM が数学的に無反応となる Silent 領域に集中していることが判明した。これは、従来のスカラー指標では決して観測できない、アーキテクチャ固有の失敗構造である。さらに、本研究ではこれら多次元的な崩壊状態をリアルタイム 3D 可視化システム (Shell–Membrane–Yolk モデル) によって実証した。可視化は、SSIM の内部成分、Tier 3 状態、Axis 1–7 の動態をそのままリアルタイムに反映し、Hidden Collapse が None / Single の 2 つのアトラクタに収束することを示した。本研究は、従来の知覚指標を超え、AI 生成画像の構造的崩壊を決定論的に理解するための、新たな AI 安全性評価の基盤を提供する。生データの来歴と再現性 (v1. 1 更新) 本プロジェクトでは、4つの異なるAIモデルに対し、それぞれ「Notta」および「TTA」を適用した計8パターンの網羅的解析 (計8, 200万ブロック、約6. 5億データポイント) を実施しています。現在執筆中の論文における未発表の知見を保護するため、これらの生データ (CSV形式) 本体は、現時点では非公開とし、関連論文の発表に合わせて本リポジトリにて公開する予定です。本バージョン (v1. 1) では、先行権の確定とデータの真正性を数学的に保証するため、すべての生出力ファイルに対する SHA256 ハッシュリスト (rawcsvₛha256. zip) を先行して提供します。これらのハッシュは、アプリケーション側で統計処理が行われる前の基礎となる AI 出力データを一意に特定するものであり、将来公開される生データが本リサーチの内容と完全に同一であることを保証します。なお、本データセットを生成したエンジンのソースコードは、すでに関連レコード (17677441) および GitHub にて公開されており、データ生成プロセスの透明性と再現性が担保されています。用語に関する注記本データセットおよびドキュメントにおいて、“Original (オリジナル) ” は “Ground Truth (GT / 正解画像) ” と同義であり、解析の基準となる高品質なソース画像を指します。データセットおよび必須引用文献: 出典: NIH ChestX-ray8 (病院規模の胸部X線データベース) 引用文献: Wang, X. , et al. (2017). "ChestX-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thoracic diseases. " Proceedings of the IEEE CVPR, 3462–3471. ダウンロード: https: //nihcc. app. box. com/v/ChestXray-NIHCC 使用ソフトウェアおよびツールデータセットの生成および処理には、以下の専用ツールおよび AI モデルを使用しました。- ソースデータ生成ツール: 著者が開発したブロック単位診断 CSV 生成アプリhttps: //zenodo. org/records/17677441- Upscayl: 画像アップスケーリングの主要 GUI / エンジン- Real‑ESRGAN / SwinIR: 画像復元・強調に使用された基盤 AI アーキテクチャ- ライセンス: Upscayl (GNU AGPLv3) / SwinIR (Apache License 2. 0) クレジットおよび引用- データ生成ツール: 著者による開発- Upscayl: Nayam Amarshe および TGS963 による開発- SwinIR: Liang, J. , et al. “SwinIR: Image Restoration Using Swin Transformer. ” arXiv: 2108. 10257 (2021) - Real‑ESRGAN: Wang, X. , et al. “Real‑ESRGAN: Training Real‑World Blind Iterative Image Restoration. ” ICCV Workshops, 2021 (https: //github. com/xinntao/Real-ESRGAN) Technical Report v1. 1: Statistical Infrastructure and Quantitative Analysis of SSIM BlindSpots in 82M Block Database

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

neco mohumohu

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Tier3 v1.2: Structural Failure Theory of SSIM — Visible and Hidden Collapse across Four AI Architectures: A Large-Scale Diagnostic Analysis of 328 Million Block-Level Data Points

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider