Purpose This paper aims to address limitations in real estate valuation in Indonesia arising from regional heterogeneity, fragmented data sources and opaque automated models. Specifically, it seeks to develop a more accurate and transparent comparative market analysis framework by integrating multisource property data and explainable machine learning, thereby improving both valuation reliability and interpretability for heterogeneous market segments. Design/methodology/approach The study integrates property listings from multiple major Indonesian online platforms into a unified dataset with over 70 spatial, structural, facility and pricing attributes. After extensive preprocessing and normalization, properties are grouped into low, medium and high price classes. Multiple regression-based machine learning models are evaluated, with hyperparameter optimization applied to tree-based models. Explainable analysis is used to examine feature contributions across price classes. Findings Extreme Gradient Boosting demonstrates the strongest overall performance, achieving Mean Absolute Percentage Errors of 26.53% for medium-priced properties and 14.60% for high-priced properties. Explainable analysis indicates that spatial attributes consistently dominate price formation, while the influence of facility-related features varies by price segment, highlighting heterogeneous valuation drivers across the market. Originality/value This paper contributes a multisource, large-scale Indonesian real estate dataset and an explainable automated valuation framework that moves beyond single-platform, black-box approaches. The study provides both predictive accuracy and interpretable insights into price formation, offering value to researchers, practitioners and policymakers seeking transparent and data-driven comparative market analysis in emerging real estate markets.
Ganesen et al. (Wed,) studied this question.