High-resolution feature recovery is pivotal for accurate Human Pose Estimation (HPE), yet prevailing decoders often rely on local, fixed upsampling operators (e.g., deconvolution) that fail to capture the global structural context of human poses. This limitation inevitably leads to the loss of high-frequency details and the generation of artifacts. To bridge this gap, we present HyT-Pose, a novel hybrid architecture that recasts pose decoding as an iterative, structure-aware super-resolution task. Distinct from traditional reconstruction paradigms, HyT-Pose introduces a strictly synchronized ‘‘Enhance-Amplify-Refine’’ framework. Specifically, we propose a Learnable Linear Upsampling (LLU) mechanism that leverages global receptive fields to adaptively ‘‘infer’’ missing spatial details rather than merely interpolating them. This mechanism is synergized with Transformer-based global context enhancement and a Multi-scale Dynamic Refinement unit (FBlock) to progressively purify feature representations. Extensive experiments on COCO, MPII, and CrowdPose benchmarks demonstrate that HyT-Pose significantly outperforms state-of-the-art methods. Notably, it achieves 76.3 AP on COCO val, establishing a new paradigm for high-precision and efficient pose estimation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yunxiang Liu
Jiakai Pan
SHILAP Revista de lepidopterología
IEEE Access
Shanghai Institute of Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Liu et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69a75cbec6e9836116a25df7 — DOI: https://doi.org/10.1109/access.2026.3658675