What question did this study set out to answer?

This research aims to enhance knowledge removal in large language models while maintaining performance and stability during quantization.

March 18, 2026Open Access

Q-ROU: Quantization-Robust Orthogonal Unlearning via Active Retention for Post-Deployment Knowledge Removal in Large Language Models

Key Points

This research aims to enhance knowledge removal in large language models while maintaining performance and stability during quantization.
Developed Quantization-Robust Orthogonal Unlearning (Q-ROU) to address unlearning failures.
Implemented Active Retention with KL constraints anchoring outputs to a frozen reference.
Created a layer-localized update method (SLUG) for bounded KL-to-uniform forgetting.
Used QuantNoise and structural regularizers for optimization towards quantization stability.
Q-ROU successfully retained neighbor outputs with high accuracy (27/28 in FP16, 28/28 in INT4).
Standard baselines showed neighbor retention collapse (0/8), highlighting Q-ROU's superiority.
Multilevel audits confirmed representational-level suppression beyond keyword masking.

Abstract

Post-deployment unlearning in large language models should remove targeted knowledge while preserving nearby capabilities, maintaining generation quality, and remaining stable under low-bit post-training quantization. Standard gradient-based unlearning exhibits three failure modes: neighbor collapse, a token-generation tradeoff (which couples forgetting to generation degeneration), and partial forgetting regression under INT4. We present Quantization-Robust Orthogonal Unlearning (Q-ROU), which jointly addresses these failures with (i) Active Retention (a KL constraint anchoring neighbor outputs to a frozen reference), (ii) a bounded KL-to-uniform forget objective with layer-localized updates (SLUG), and (iii) QuantNoise and structural regularizers that steer optimization toward quantization-stable solutions. On the 28-probe 3B multi-entity stress test, the non-AR baselines GA, GradDiff, and RepBend collapse neighbor retention (0/8), whereas Q-ROU achieves 27/28 in FP16 and 28/28 in INT4. Depth probing and adversarial extraction audits provide strong evidence of representational-level suppression beyond simple keyword masking, with results validated across 3B and 8B models on TOFU and publicly available personal-fact settings. This multilevel evidence indicates that knowledge removal is consistent with representational-level erasure, not merely surface-level keyword suppression.

Q-ROU: Quantization-Robust Orthogonal Unlearning via Active Retention for Post-Deployment Knowledge Removal in Large Language Models

Key Points

Abstract

Cite This Study