What does this research mean for the field?

A hybrid data generation workflow combining structure-based and interaction-based sampling strategies is necessary to produce the high-quality training data required for reliable machine-learning potentials in complex heterogeneous catalytic systems. Novelty: ClaimNovelty.SYNTHESIS. Consensus alignment: ConsensusAlignment.NEUTRAL.

May 27, 2026Open Access

Smarter Data: Rethinking Data Generation for Machine Learning Potentials in Heterogeneous Catalysis

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Machine-learning potentials (MLPs) promise near-first-principles fidelity at scales relevant to heterogeneous catalysis, yet the key determinant of their reliability remains the quality of the training data. A paramount challenge lies in constructing training sets that capture both the system-specific structural domains and the broader chemical variability arising from the complex, dynamic nature of heterogeneous catalytic systems. In this perspective, we argue that the key question is no longer simply how to generate more data, but how to generate the right data. We reinterpret the main strategies used to construct MLP training sets for heterogeneous catalysis through the lens of scope, relevance, and coverage. We then examine the challenges of data generation in catalytic systems and how they have shaped prevailing structure-based sampling practices, which encode system relevance and targeted coverage directly into the data set. We also discuss emerging interaction-based sampling strategies that aim to broaden local-interaction support beyond narrowly predefined systems. We conclude by consolidating a possible hybrid data generation workflow that combines the strengths of both approaches, thereby bringing MLPs closer to the simulation of complex heterogeneous catalytic systems.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo