Abstract We consider the problem of improving the accuracy, convergence, and conditioning of univariate nonlinear function approximations using (mainly) shallow neural networks (NN) with a rectified linear unit (ReLU) activation function. The standard L₂ based approximation problem is ill-conditioned and the behaviour of the optimisation algorithms used in training these networks degrades rapidly as the width of the network increases. This can lead to significantly poorer approximation in practice than we would expect from the theoretical expressivity of the ReLU NN architecture. Univariate shallow ReLU NNs and traditional approximation methods, such as univariate Free Knot Splines (FKS) span the same function space, and thus have the same theoretical expressivity. However, the FKS representation, both remains well-conditioned as the number of knots increases, and can be highly accurate if the knots are correctly placed. We leverage the theory of optimal piecewise linear interpolants to improve the training procedure for both a FKS and a ReLU NN. For the FKS we propose a novel two-level training procedure. First solving the nonlinear problem of finding the optimal knot locations of the interpolating FKS using an equidistribution approach. Then solving the nearly linear, well-conditioned, problem of finding the optimal weights and knots of the FKS. The training of the FKS gives insights into how we can train a ReLU NN effectively to give an equally accurate approximation. To do this we combine the training of the ReLU NN with an equidistribution based loss to find the breakpoints of the ReLU functions, this is then combined with preconditioning the ReLU NN approximation (to take an FKS form) to find the scalings of the ReLU functions. This procedure leads to a fast, well-conditioned and reliable method of finding an accurate shallow ReLU NN approximation to a univariate target function. This method avoids spectral bias and is highly effective for a wide variety of functions. We test this method on a series of regular, singular, and rapidly varying target functions and obtain good results, realising the expressivity of the shallow ReLU network in all cases. We conclude that in the shallow case to gain full expressivity for the ReLU NN we must both find the optimal breakpoints (by equidistribution) and precondition the problem of finding the optimal coefficients. We then extend our results to more general activation functions, and to deeper networks.
Building similarity graph...
Analyzing shared references across papers
Loading...
Simone Appella
S Arridge
Chris Budd
IMA Journal of Applied Mathematics
University College London
University of Bath
Building similarity graph...
Analyzing shared references across papers
Loading...
Appella et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69eefd64fede9185760d41cd — DOI: https://doi.org/10.1093/imamat/hxag006