Biomass-derived sulfur-containing hard carbons are promising anode candidates for sodium-ion batteries, but cross-study optimization remains difficult because reported electrochemical performance reflects both synthesis history and incomplete or non-uniform structural characterization. Here, we assembled a focused literature-derived dataset of 101 records from 16 journal articles and compared the predictive value of three information sources: precursor descriptors, process variables, and measured structural descriptors. We further introduced domain-knowledge-guided precursor descriptors to encode interpretable aspects of precursor chemistry and architecture, including lignin-related richness, polysaccharide contribution, volatile tendency, precursor-component coupling, and post-treatment category. In controlled feature-set comparisons, the model combining precursor and process descriptors achieved an R2 of 0.59, outperforming the conventional combination of process and structural descriptors (R2 = 0.57) and remaining close to the full-information setting (R2 ≈ 0.61). Model interpretation further showed that, when structural descriptors were removed, predictive reliance shifted toward precursor and process variables, indicating that accessible upstream descriptors retain a meaningful fraction of the formation-pathway information relevant to sodium storage. These results should be interpreted within this curated sulfur-containing literature space rather than as a universal predictor, but they demonstrate that domain-knowledge-guided precursor encoding can support low-characterization, screening-oriented prediction and experimental prioritization.
Yu et al. (Fri,) studied this question.