June 24, 2014Open Access

Recurrent Models of Visual Attention

Key Points

Key points are not available for this paper at this time.

Abstract

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Mnih et al. (Tue,) studied this question.

www.synapsesocial.com/papers/6a06ebed2edded7c7b841cbf — DOI: https://doi.org/10.48550/arxiv.1406.6247

Authors

Volodymyr Mnih

Nicolas Heess

Alex Graves

Actions

Institutions

Google (United States)

DeepMind (United Kingdom)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Recurrent Models of Visual Attention

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion