This article examines a specific problem in AI evaluation: cases in which strong capability claims are drawn from results that do not, by themselves, justify those claims. Its central argument is that some benchmark results, behavioural outputs, and evaluative signals fail to support stronger conclusions not merely because the evidence is weak, but because the result-channel does not preserve the distinctions required for those conclusions. The paper introduces a minimal criterion for identifying such cases, described here as same-channel non-repairability, and develops it through a case study centred on recent debates about reasoning-model evaluation, including The Illusion of Thinking, subsequent rebuttals, and item-level benchmark analysis. Its broader aim is to clarify when the problem in AI evaluation is not simply lack of evidence, but a limitation in the representational route itself. This article forms part of a wider research programme on representational limits, distinction preservation, and the conditions under which stronger claims can or cannot be supported by observable results. It is intended as one applied contribution within that broader philosophical and formal framework.
M Evoluit (Sun,) studied this question.