Summary Data quality, feature engineering, and model generalization are key challenges in applying machine learning to high-performance materials design. Here, we report a framework addressing these challenges using thermoelectric materials as a case study. Data are collected and curated from the Starrydata2 database, followed by multi-step feature engineering, including construction, selection, and optimization, to obtain a physically meaningful subset. By benchmarking against mainstream regression models, with independent external dataset testing and model interpretability analysis, a modified tabular prior-data fitted network (TabPFN) model (model I) demonstrates superior accuracy and generalization in predicting the thermoelectric figure of merit (ZT). Model I is applied to halide double perovskites from the Materials Project database, identifying candidates including Rb2CuSbCl6, Cs2AgAuCl6, and Rb2CuBiCl6. First-principles calculations validate their thermoelectric properties, with n-type Cs2AgAuCl6 achieving ZTmax = 1.64 at 800 K. These results highlight the potential of a data-driven and computationally synergistic approach for discovering high-performance thermoelectric materials.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yuqing Sun
Xiaorui Chen
Jianzhi Gao
Cell Reports Physical Science
Shaanxi Normal University
Xi’an University
Building similarity graph...
Analyzing shared references across papers
Loading...
Sun et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69a7679ebadf0bb9e87e1a73 — DOI: https://doi.org/10.1016/j.xcrp.2025.103093