What question did this study set out to answer?

To develop a fault-tolerant computing architecture that efficiently addresses errors in large language models.

April 10, 2026Open Access

OrCA: An Efficient Computing Architecture for Fault-Tolerant Large Language Model Inference via Outlier-Aware Recalculation

Key Points

To develop a fault-tolerant computing architecture that efficiently addresses errors in large language models.
Conducted fault injection experiments on multiple representative large language models.
Identified critical elements affected by faults using a selective protection approach.
Proposed the OrCA architecture to optimize dataflow and fault tolerance.
OrCA outperforms traditional fault-tolerant accelerators with equal or reduced area overhead.
Demonstrated improved performance under fault rates up to 10 times higher.
Supported elastic protection against various hardware faults, adapting to different requirements.

Abstract

Deep-learning accelerators such as TPUs and GPUs now run ever larger models. Conventional fault-tolerant accelerators, designed for CNNs, are ineffective and cost-prohibitive for emerging large language models (LLMs) due to their exponentially higher computational demands. To address the impact of soft errors on computation, various specialized fault-tolerant DNN accelerators have been proposed, typically employing full-element protection. Yet, emerging LLMs exhibit exponentially higher computational demands compared to traditional CNN models, rendering conventional fault-tolerant accelerators both cost-prohibitive and ineffective in handling multi-point faults. To tackle these challenges, we conduct fault injection experiments on multiple representative LLMs, revealing the inherent parameter redundancy in Transformer-based models. Specifically, only 1%–2% of the elements significantly affect the output when perturbed—these critical elements are identified as outliers. Leveraging this insight, we propose OrCA, a hierarchically redundant fault-tolerant accelerator, which introduces the principle of selective protection for critical elements and optimizes the dataflow accordingly. Through extensive fault injection experiments and hardware simulations, we demonstrate that OrCA outperforms conventional fault-tolerant accelerators, achieving superior protection at equal or lower area overhead. Notably, OrCA delivers better performance under fault rates up to 10× higher and supports elastic protection against diverse hardware faults (e.g., transient and permanent faults), adapting to varying fault-tolerance requirements. Furthermore, OrCA breaks the limitation of traditional accelerators that require separate error detection and correction steps, enabling more efficient fault resilience.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yihao Shi

Sheng Ma

Tao Li

Journals

ACM Transactions on Design Automation of Electronic Systems

Actions

Institutions

National University of Defense Technology

National Defense University

Milli Savunma Üniversitesi

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

OrCA: An Efficient Computing Architecture for Fault-Tolerant Large Language Model Inference via Outlier-Aware Recalculation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider