What question did this study set out to answer?

To develop a framework that enhances the efficiency and robustness of model collaboration without sacrificing performance.

April 19, 2026Open Access

TRIDENT: Efficient Small-Large Model Collaboration via Heterogeneous Expert Decoupling

Key Points

To develop a framework that enhances the efficiency and robustness of model collaboration without sacrificing performance.
Introduced the TRIDENT framework for heterogeneous collaborative inference
Leveraged MLP and KAN to utilize different reasoning styles
Implemented Orthogonal Feature Decoupling Distillation with specific loss functions
Developed a Dual-Threshold Arbiter to manage expert hallucinations
Significantly reduced the Invocation Rate of PLMs
Maintained high accuracy across multiple datasets
Achieved a distinct Pareto optimal balance in feature utilization
Validated effective collaboration among heterogeneous experts

Abstract

The burgeoning scale of Pre-trained Large Models (PLMs) has intensified the demand for efficient inference without compromising performance, while existing large model collaborative frameworks have shown promise, they often suffer from functional redundancy among experts and limited robustness in complex cross-domain scenarios. In this paper, we propose Tri-gate Routing for Inference via Decoupled Efficient Network Technologies (TRIDENT), a highly efficient and robust heterogeneous collaborative inference framework. TRIDENT leverages the complementary inductive biases of MLP (for statistical patterns) and KAN (for symbolic logic) to maximize reasoning potential with minimal parametric overhead. To address feature homogenization in traditional distillation, we introduce Orthogonal Feature Decoupling Distillation, utilizing an orthogonality loss Lorth for functional decoupling and a reconstruction loss Lrecon to anchor decoupled features to the PLM knowledge manifold. During inference, a Dual-Threshold Arbiter effectively detects expert hallucinations by integrating individual confidence τcon and heterogeneous consistency τagree. Extensive experiments on CIFAR-100-LT, XNLI, and GSM8K demonstrate that TRIDENT significantly reduces the Invocation Rate (IR) of PLMs while maintaining high accuracy. Our findings reveal a distinct Pareto optimal balance and validate the spontaneous division of labor between heterogeneous experts. By transcending the limitations of single-architecture systems, TRIDENT provides a robust and interpretable pathway for efficient collaborative intelligence.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Guangyu Dai

Siliang Tang

Yueting Zhuang

Journals

Electronics

Actions

Institutions

Zhejiang University of Technology

Zhejiang University of Science and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

TRIDENT: Efficient Small-Large Model Collaboration via Heterogeneous Expert Decoupling

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study