What question did this study set out to answer?

The aim is to create a transparent and accurate framework for real estate valuation in Indonesia using diverse data sources.

April 10, 2026

Integrating multi-source data and explainable AI for housing market analysis in Indonesia

Key Points

The aim is to create a transparent and accurate framework for real estate valuation in Indonesia using diverse data sources.
Developed a unified dataset from major Indonesian online property platforms.
Grouped properties into low, medium, and high price classes based on attributes.
Evaluated multiple regression-based machine learning models with hyperparameter optimization.
Conducted explainable analysis to understand feature contributions across price classes.
Extreme Gradient Boosting achieved a Mean Absolute Percentage Error of 26.53% for medium-priced properties.
For high-priced properties, the Mean Absolute Percentage Error was 14.60%.
Spatial attributes were found to be the primary drivers of price formation.
The influence of facility-related features varied depending on the price segment.

Abstract

Purpose This paper aims to address limitations in real estate valuation in Indonesia arising from regional heterogeneity, fragmented data sources and opaque automated models. Specifically, it seeks to develop a more accurate and transparent comparative market analysis framework by integrating multisource property data and explainable machine learning, thereby improving both valuation reliability and interpretability for heterogeneous market segments. Design/methodology/approach The study integrates property listings from multiple major Indonesian online platforms into a unified dataset with over 70 spatial, structural, facility and pricing attributes. After extensive preprocessing and normalization, properties are grouped into low, medium and high price classes. Multiple regression-based machine learning models are evaluated, with hyperparameter optimization applied to tree-based models. Explainable analysis is used to examine feature contributions across price classes. Findings Extreme Gradient Boosting demonstrates the strongest overall performance, achieving Mean Absolute Percentage Errors of 26.53% for medium-priced properties and 14.60% for high-priced properties. Explainable analysis indicates that spatial attributes consistently dominate price formation, while the influence of facility-related features varies by price segment, highlighting heterogeneous valuation drivers across the market. Originality/value This paper contributes a multisource, large-scale Indonesian real estate dataset and an explainable automated valuation framework that moves beyond single-platform, black-box approaches. The study provides both predictive accuracy and interpretable insights into price formation, offering value to researchers, practitioners and policymakers seeking transparent and data-driven comparative market analysis in emerging real estate markets.

Bookmark

Integrating multi-source data and explainable AI for housing market analysis in Indonesia

Key Points

Abstract

Cite This Study