Geospatial Artificial Intelligence (GeoAI) enables the automated generation of built environment map features, such as building outlines/footprints, on a global scale. However, the integration of these AI-generated datasets into Volunteered Geographic Information (VGI) platforms like OpenStreetMap (OSM) risks incorporating `AI slop’, consisting of geometrically inconsistent/unreliable data, into the online map. While the OSM “Code of Conduct for Automated Edits” provides a policy framework for data ingestion, it lacks a machine-enforceable mechanism for real-time quality gating. This paper proposes a GeoAI-Gatekeeper to perform this task—an automated process that applies empirical Acceptable Quality Thresholds (AQT) to address the GeoAI data governance problem. Because the Gatekeeper utilizes an intrinsic, no-reference evaluation of geometric fidelity, it can assess incoming AI-generated data streams in real-time without requiring ground-truth benchmarks. Importantly, it focuses exclusively on the geometric validation of building footprints, acknowledging for now that semantic enrichment, such as tagging, remains a human-centric task. The presented GeoAI-Gatekeeper is a working prototype developed for a specific urban area, systematically triaging incoming AI-generated data into three tiers; Auto-Accept, Manual Review, and Reject. It provides a Web-GIS interface for Human-in-the-Loop (HITL) functionality to ensure the OSM community remains the final arbiter of acceptable data quality. Testing the Gatekeeper in Dublin (Ireland) demonstrates that our solution can auto-ingest 93.6% of features with a 14x reduction in human review effort while still adhering to OSM’s cartographic integrity standards. By implementing qualitative community guidelines into machine-enforceable thresholds, our approach introduces a viable methodology for next-generation hybrid VGI systems. Importantly, it ensures that the transition towards automated data ingestion reinforces, rather than undermines, the reliability of global crowd-source mapping datasets.
Niroshan et al. (Tue,) studied this question.