Steklov Activations presents a family of compact-support piecewise-polynomial activation functions derived from Steklov kernels in approximation theory. Unlike standard smooth activations such as GELU or SiLU, Steklov activations have finite support in their gating function: outside a controllable transition region, neurons are exactly inactive or fully linear. This gives the family a distinctive property not present in common dense activations: a tunable mechanism for exact neuron inactivity. The paper shows that the family includes HardSwish exactly and can closely approximate GELU, while introducing a scale parameter that controls the tradeoff between smoothness, selectivity, and sparsity. It studies these activations across image classification and language modeling, including GPT-2 and a small LLaMA-style decoder, and analyzes their behavior in terms of performance, inactivity patterns, pruning, and inference efficiency.
Building similarity graph...
Analyzing shared references across papers
Loading...
Aleksandr Masalskikh
Building similarity graph...
Analyzing shared references across papers
Loading...
Aleksandr Masalskikh (Thu,) studied this question.
www.synapsesocial.com/papers/69d894526c1944d70ce05484 — DOI: https://doi.org/10.5281/zenodo.19454642