What question did this study set out to answer?

This research aims to explore a novel adaptive activation mechanism to optimize performance in compact ternary language models.

April 23, 2026Open Access

Cognitive-Effort-Gated FFN Concentration in a Compact Ternary Language Model

Puntos clave

This research aims to explore a novel adaptive activation mechanism to optimize performance in compact ternary language models.
Introduced a cognitive-effort-gated FFN concentration mechanism based on cross-entropy and output entropy.
Utilized a normalized scalar active fraction to apply an ordinal prefix mask over SwiGLU feed-forward networks.
Evaluated the implementation in a ternary transformer model with heatmap analysis of FFN gate-projection.
Demonstrated effective task learning and telemetry of active-width in model trials.
Showed that lower-index FFN neurons are preserved while masking the tail, resulting in improved model capacity.
Heatmap analysis indicated stronger lower-index row norms in checkpoints, supporting further studies.

Resumen

This preprint introduces cognitive-effort-gated FFN concentration, a lightweight adaptive activation mechanism for compact ternary language models. A scalar active fraction is derived from per-token cross-entropy during supervised training, or from output entropy in target-free settings, normalized by an initial entropy reference, and used to apply an ordinal prefix mask over each SwiGLU feed-forward network intermediate dimension. The mechanism preserves lower-index FFN neurons and masks the tail, coupling measured difficulty to active model capacity without sparse experts, token-level routing, or a learned scheduler. The implementation is studied in a compact 44.8M-parameter ternary transformer using BitNet-style ternary linear semantics, grouped-query attention, and SwiGLU FFNs. Archived repeated-token identity runs show task learning and active-width telemetry. Included FFN gate-projection heatmaps show stronger lower-index row norms in archived checkpoints, supporting the motivation for a broader ablation study. The work is presented as a technical preprint: the current evidence establishes the mechanism and qualitative concentration pattern, while controlled prefix-versus-tail metrics, fixed-width baselines, multiple seeds, and matched-compute comparisons remain part of the validation plan.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo