Key points are not available for this paper at this time.
While great strides have been made in using deep learning algorithms to solve learning tasks, the problem of unsupervised learning - leveraging examples to learn about the structure of a domain - remains a unsolved challenge. Here, we explore prediction of future frames in a sequence as an unsupervised learning rule for learning about the of the visual world. We describe a predictive neural network ("PredNet") architecture that is inspired by the concept of "predictive coding" the neuroscience literature. These networks learn to predict future frames a video sequence, with each layer in the network making local predictions only forwarding deviations from those predictions to subsequent network. We show that these networks are able to robustly learn to predict the of synthetic (rendered) objects, and that in doing so, the networks internal representations that are useful for decoding latent object (e. g. pose) that support object recognition with fewer training. We also show that these networks can scale to complex natural image (car-mounted camera videos), capturing key aspects of both egocentric and the movement of objects in the visual scene, and the learned in this setting is useful for estimating the steering. Altogether, these results suggest that prediction represents a powerful for unsupervised learning, allowing for implicit learning of object scene structure.
Building similarity graph...
Analyzing shared references across papers
Loading...
William Lotter
Gabriel Kreiman
David Cox
Building similarity graph...
Analyzing shared references across papers
Loading...
Lotter et al. (Wed,) studied this question.
www.synapsesocial.com/papers/6a008026b124fe581985ecd0 — DOI: https://doi.org/10.48550/arxiv.1605.08104