What question did this study set out to answer?

This research aims to understand how convolutional neural networks handle information during learning in image classification tasks.

January 21, 2026Open Access

Uncovering Neural Learning Dynamics Through Latent Mutual Information

Puntos clave

This research aims to understand how convolutional neural networks handle information during learning in image classification tasks.
Track mutual information between inputs, intermediate representations, and labels in various neural network architectures.
Analyze changes in label-relevant and input mutual information across different layers.
Implement inference-time knockouts, shuffles, and perturbations to evaluate functional necessity of high-MI channels.
Introduce a dependence-aware regularizer based on the Hilbert–Schmidt Independence Criterion.
Label-relevant mutual information increases with layer depth across different architectures.
Input mutual information varies significantly depending on architecture and activation types.
High-MI channels show functional necessity for accuracy in information processing.
The proposed regularizer leads to small accuracy improvements and faster convergence during training.

Resumen

We study how convolutional neural networks reorganize information during learning in natural image classification tasks by tracking mutual information (MI) between inputs, intermediate representations, and labels. Across VGG-16, ResNet-18, and ResNet-50, we find that label-relevant MI grows reliably with depth while input MI depends strongly on architecture and activation, indicating that “compression’’ is not a universal phenomenon. Within convolutional layers, label information becomes increasingly concentrated in a small subset of channels; inference-time knockouts, shuffles, and perturbations confirm that these high-MI channels are functionally necessary for accuracy. This behavior suggests a view of representation learning driven by selective concentration and decorrelation rather than global information reduction. Finally, we show that a simple dependence-aware regularizer based on the Hilbert–Schmidt Independence Criterion can encourage these same patterns during training, yielding small accuracy gains and consistently faster convergence.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Arianna Issitt

Alex Merino

Lamine Deen

Journals

Entropy

Actions

Institutions

Florida Institute of Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Uncovering Neural Learning Dynamics Through Latent Mutual Information

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study