What question did this study set out to answer?

To create a predictive model that identifies high-level clouds using machine learning and meteorological data.

April 1, 2026Open Access

Machine Learning-Based Prediction of High-Level Clouds: Integrating Meteorological Observations with Independent Lidar Validation

Key Points

To create a predictive model that identifies high-level clouds using machine learning and meteorological data.
Developed a machine learning model trained on meteorological parameters and human observations.
Conducted statistical analysis comparing lidar and meteorological observations for high-level clouds.
Determined optimal thresholds for cloud cover where meteorological and lidar data align.
The model achieved ROC AUC values of 0.87–0.88 for detecting clouds and 0.77–0.78 for absence of clouds.
XGBoost was identified as the most effective method for integrating diverse data types for predictions.

Abstract

This study develops a machine learning-based predictive model for identifying high-level clouds (HLCs). The model uses meteorological parameters as input features and is trained against human-recorded meteorological observations. A statistical analysis of the relationship between two independent methods of registering HLCs—lidar and meteorological observations—has been performed. Optimal thresholds for the total amount of cloud cover, at which meteorological data are consistent with lidar data, have been determined. The results demonstrate the promising performance of ML models in identifying the links between weather conditions and the probability of HLC detection, which is confirmed by ROC AUC (Area Under the Curve of the Receiver Operating Characteristic) values in the range of 0.87–0.88 for the presence and 0.77–0.78 for the absence of clouds, as well as balanced metrics Precision, Recall, and F1. The XGBoost (eXtreme Gradient Boosting) model proved to be the most robust, demonstrating the ability to effectively integrate data of various types for reliable prediction in various conditions.

Machine Learning-Based Prediction of High-Level Clouds: Integrating Meteorological Observations with Independent Lidar Validation

Key Points

Abstract

Cite This Study