Cloud data lakes require continuous optimization across multiple dimensions: physical design (partitioning, compression), query execution, and data quality assurance. This paper presents AIDALOS (AI-Driven Autonomous Data Lake Optimization System), a framework that integrates quality monitoring with physical optimization decisions. The system uses reinforcement learning to adapt monitoring intensity and trigger physical design changes based on detected anomalies, drift patterns, and workload shifts. Deep Q-networks learn when to repartition tables, ensemble models select compression codecs based on data characteristics and access patterns, and neural cost estimators improve query plan selection. Our evaluation across five machine learning pipelines demonstrates that this integrated approach achieves 47% storage cost reduction and 62% query performance improvement compared to static configurations, with 89.9% F1-score for quality issue detection. The key insight is that quality signals drift detection, anomaly patterns, and workload changes should directly inform physical optimization decisions rather than treating these as separate concerns.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sowjanya Deva
Surya Narayana Reddy
Building similarity graph...
Analyzing shared references across papers
Loading...
Deva et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69d895046c1944d70ce05f28 — DOI: https://doi.org/10.13016/m26ij6-3hyq
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: