March 24, 2024Open Access

LF-ViT: 효율적인 이미지 인식을 위한 Vision Transformer의 공간적 중복성 감소

Key Points

Key points are not available for this paper at this time.

Abstract

Vision Transformer(ViT)는 고해상도 이미지를 처리할 때 정확도에서 뛰어나지만, 상당한 공간적 중복성 문제에 직면하여 계산 및 메모리 요구량이 증가합니다. 이를 해결하기 위해, 우리는 Localization and Focus Vision Transformer(LF-ViT)를 제안합니다. 이 모델은 성능 저하 없이 계산 부담을 전략적으로 줄이는 방식으로 작동합니다. Localization 단계에서는 저해상도 이미지를 처리하며, 명확한 예측이 어려울 경우, 최초 결과를 기반으로 클래스 판별 영역을 효과적으로 식별하고 강조하는 혁신적인 Neighborhood Global Class Attention(NGCA) 메커니즘이 작동합니다. 이후 Focus 단계에서는 원본 이미지에서 지정된 이 영역을 활용해 인식 성능을 향상시킵니다. 고유하게도, LF-ViT는 두 단계 모두에서 동일한 파라미터를 사용해 원활한 종단 간 최적화를 보장합니다. 실험 결과, LF-ViT는 Deit-S의 FLOPs를 63% 대폭 감소시키면서 처리량을 두 배로 증가시키는 성과를 입증했습니다. 프로젝트 코드는 https://github.com/edgeai1/LF-ViT.git 에 있습니다.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Youbing Hu

Yun Cheng

Anqi Lu

Actions

Institutions

Harbin Institute of Technology

Xidian University

Swiss Data Science Center

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

LF-ViT: 효율적인 이미지 인식을 위한 Vision Transformer의 공간적 중복성 감소

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider