October 9, 2025Open Access

Sub-Scaling Laws: On the Role of Data Density and Training Strategies in LLMs

Key Points

Performance improvements in language models face diminishing returns due to sub-scaling phenomena.
Over 400 models were examined, revealing that high data density negatively impacts performance.
Key factors influencing sub-scaling are data quality and optimal resource allocation for training.
A proposed sub-optimal scaling law offers better predictions for performance in sub-scaling scenarios.

Abstract

Traditional scaling laws in natural language processing suggest that increasing model size and training data enhances performance. However, recent studies reveal deviations, particularly in large language models, where performance improvements decelerate, which is a phenomenon known as sub-scaling. This paper revisits these scaling laws by examining the impact of data quality and training strategies on model performance. Through extensive empirical analysis of over 400 models, we identify high data density and non-optimal resource allocation as key factors contributing to sub-scaling. High data density leads to diminishing returns due to redundant information, while optimal resource allocation is crucial for sustained performance improvements. We propose a sub-optimal scaling law that better predicts performance in sub-scaling regimes, highlighting the importance of data quality and diversity.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Zhengyu Chen

Siqi Wang

Teng Xiao

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Sub-Scaling Laws: On the Role of Data Density and Training Strategies in LLMs

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider