June 1, 2023

MaPLe：マルチモーダルプロンプト学習

Key Points

Key points are not available for this paper at this time.

Abstract

CLIPのような事前学習されたビジョン・ランゲージ（V-L）モデルは、下流タスクへの優れた一般化能力を示しています。しかし、入力テキストプロンプトの選択に対して敏感であり、良好な性能を発揮するためにはプロンプトテンプレートの慎重な選択が必要です。自然言語処理（NLP）の文献に触発され、最近のCLIP適応アプローチでは、下流タスク用にCLIPをファインチューニングするためのテキスト入力としてプロンプトを学習しています。我々は、CLIPの単一のブランチ（言語またはビジョン）で表現を適応するためにプロンプトを使用することは、下流タスクにおいて両方の表現空間を動的に調整する柔軟性を許さないため、最適ではないことに注目しています。本研究では、視覚と言語の両ブランチに対してマルチモーダルプロンプト学習（MaPLe）を提案し、視覚と言語の表現間の整合性を向上させます。我々の設計は、相互の相乗効果を確保するために視覚と言語のプロンプト間の強い結合を促進し、独立した単一モーダル解の学習を抑制します。さらに、異なる初期段階で別個のプロンプトを学習し、段階ごとの特徴関係を段階的にモデル化して豊かなコンテキスト学習を可能にします。我々のアプローチの有効性は、新規クラスへの一般化、新たなターゲットデータセット、未確認のドメインシフトという代表的な3つのタスクで評価しました。最先端手法Co-CoOpと比較して、MaPLeは好意的な性能を示し、11の多様な画像認識データセットの平均において新規クラスで3.45%の絶対的向上、全体の調和平均で2.72%の向上を達成しました。コードと事前学習モデルは https://github.com/muzairkhattak/multimodal-prompt-learning にて公開しています。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Muhammad Uzair Khattak

Hanoona Rasheed

Muhammad Maaz

Actions

Institutions

Australian National University

Mohamed bin Zayed University of Artificial Intelligence

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Khattakら（木曜日，）がこの問題を研究しました。

www.synapsesocial.com/papers/69d7d5c111d83f35e5ae2e59 — DOI: https://doi.org/10.1109/cvpr52729.2023.01832

Also consider

Synapse has enriched 3 closely related papers on similar clinical questions. Consider them for comparative context:

UCF-101: A dataset of 101 human actions classes from videos in the wild· 2012 · 4,445 citations
Class-Agnostic Object Detection with Multi-modal Transformer· 2022 · 70 citations
Automated Flower Classification over a Large Number of Classes· 2008 · 3,178 citations

MaPLe：マルチモーダルプロンプト学習

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider