What question did this study set out to answer?

This work aims to improve the efficiency of data selection in material science using a combined PPO and GPR framework.

April 10, 2026Open Access

PPO-GPR: A Custom Proximal Policy Optimization Tool for Active Reinforcement Learning

Key Points

This work aims to improve the efficiency of data selection in material science using a combined PPO and GPR framework.
Integrated proximal policy optimization with Gaussian process regression.
Developed a custom Gymnasium environment for dynamic data selection.
Utilized an action masking mechanism to prevent redundancy in data acquisition.
Evaluated the framework's performance through metrics like R2, MAE, and RMSE.
Achieved 77–86% data savings compared to full GCMC grids.
Queried only ∼14–23% of the candidate data pool.
PPO policy converged stably, focusing on regions with rapid selectivity changes.

Abstract

Efficient data selection is critical in domains where data acquisition is expensive and time-consuming, such as material science. In this work, we introduce a novel active learning framework that integrates proximal policy optimization (PPO) with Gaussian process regression (GPR) to strategically select informative data points and thereby enhance predictive modeling. Leveraging the inherent stability and sample efficiency of PPO, achieved through a clipped surrogate objective, the framework guides data acquisition via a custom-designed Gymnasium environment tailored for GPR. In this environment, the PPO agent dynamically chooses data points based on their potential to improve the GPR’s performance, as measured by the R2 score, while preventing redundancy through an action masking mechanism. We apply the proposed methodology to predict the selectivity of methane (CH4) over higher alkanes in metal–organic frameworks (MOFs), focusing on CuBTC and IRMOF-1. The framework is evaluated using both ternary and quaternary gas mixtures, where the performance of the GPR is assessed through metrics such as R2, mean absolute error (MAE), and root mean squared error (RMSE). Across CuBTC and IRMOF-1 in ternary and quaternary hydrocarbon mixtures, PPO-guided acquisition achieves 77–86% data savings relative to full GCMC grids, typically querying only ∼14–23% of the candidate pool while the clipped-update PPO policy converges stably by focusing selections in the pressure–temperature–composition regions where selectivity changes most rapidly. This work shows the potential of combining advanced reinforcement learning techniques with regression models to accelerate material discovery and optimize gas separation processes.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Osaro et al. (Tue,) studied this question.

www.synapsesocial.com/papers/69d8940c6c1944d70ce04f5b — DOI: https://doi.org/10.1021/acsengineeringau.5c00122

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Intelligent screening of porous materials: A review of active-learning approaches in MOF research· 2025 · 2 citations
Machine learning of molecular properties: Locality and active learning· 2018 · 160 citations
Untitled· 2022 · 19,579 citations
Deep deterministic policy gradient algorithm: A systematic review· 2024 · 159 citations
Reinforcement Learning with Deep Deterministic Policy Gradient· 2021 · 75 citations

Authors

Etinosa Osaro

Yamil J. Colón

Journals

ACS Engineering Au

Actions

Institutions

University of Notre Dame

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

PPO-GPR: A Custom Proximal Policy Optimization Tool for Active Reinforcement Learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion