What question did this study set out to answer?

The study aims to enhance the RMSProp algorithm using average gradients for training deep neural networks in complex loss landscapes.

April 10, 2026

Toward Enhancing RMSProp With Forward–Looking Gradient Updates for Complex Loss Landscapes

Key Points

The study aims to enhance the RMSProp algorithm using average gradients for training deep neural networks in complex loss landscapes.
Introduces an approximated integrated gradient averaged over weight updates.
Compares the new algorithm with standard RMSProp and Adam on complex models.
Evaluates performance on datasets like MNIST, Fashion MNIST, and IMDb.
Focuses on deep models without skip connections and many nonlinear activations.
The new method requires fewer iterations to achieve target training loss.
Performance shows improvement on complex models compared to standard RMSProp and Adam.
Moderate increases in computational and memory costs were noted in comparison to RMSProp.

Abstract

This letter introduces a novel algorithm for training deep neural networks with many nonlinear layers. Our method uses an approximated integrated gradient that is averaged over the range of the weight update to more accurately capture the loss change resulting from parameter updates. Unlike standard gradients, this average gradient improves learning efficiency in certain scenarios. We incorporate the approximated average gradient into RMSProp and compare the resulting algorithm to conventional RMSProp and Adam. We evaluate the approach on deep models lacking skip connections, such as those with many nonlinear activations and no residual structure, where traditional methods typically encounter difficulties. These models that focus on extracting high-order features create a loss landscape more akin to that of a biological brain. Our method requires significantly fewer iterations to reach a target training loss on MNIST, Fashion MNIST, and IMDb benchmarks for both convolutional and fully connected architectures across different initialization schemes. While our approach incurs moderately higher computational and memory costs compared to standard RMSProp, its performance on shallow models remains comparable. Nevertheless, our main contributions are (1) introducing the average gradient concept as an efficient alternative to computing high-order derivatives, (2) offering a novel factorization formula for approximating the average gradient, accompanied by a formal derivation., and (3) showing an example algorithm that leverages this formula to enhance the efficiency of RMSProp for some models, as validated by our evaluation.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Wolniak et al. (Tue,) studied this question.

www.synapsesocial.com/papers/69d893896c1944d70ce04798 — DOI: https://doi.org/10.1162/neco.a.1514

Authors

Rafał Wolniak

Bożena Kostek

Journals

Neural Computation

Actions

Institutions

Gdańsk University of Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Toward Enhancing RMSProp With Forward–Looking Gradient Updates for Complex Loss Landscapes

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion