January 1, 1994

Fast Exact Multiplication by the Hessian

Key Points

Key points are not available for this paper at this time.

Abstract

Just storing the Hessian H (the matrix of second derivatives δ 2 E/δw i δw j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. To calculate Hv, we first define a differential operator R v f (w) = (δ/δr) f (w + rv) |r=0, note that R v ▽w = Hv and R v w = v, and then apply R v · to the equations used to compute ▽ w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to a one pass gradient calculation algorithm (backpropagation), a relaxation gradient calculation algorithm (recurrent backpropagation), and two stochastic gradient calculation algorithms (Boltzmann machines and weight perturbation). Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating any need to calculate the full Hessian.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Barak A. Pearlmutter

Journals

Neural Computation

Actions

Institutions

Princeton University

Siemens (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Fast Exact Multiplication by the Hessian

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study