What question did this study set out to answer?

The aim is to evaluate and compare the performance of different reinforcement learning algorithms in the context of stock market volatility.

April 16, 2026Open Access

Performance-Based Assessment: Reinforcement Learning Models and zStock Market Volatility

Read Full Paperexternally

Key Points

The aim is to evaluate and compare the performance of different reinforcement learning algorithms in the context of stock market volatility.
Utilized stock price data for AAPL, MSFT, and SPY from January 2012 to June 2014.
Employed and trained Q-learning, Deep Q-Networks (DQN), and Proximal Policy Optimization (PPO) algorithms.
Evaluated performance using risk-adjusted metrics such as Sharpe and Sortino ratios, maximum drawdown, total return, and number of trades.
The Proximal Policy Optimization (PPO) agent consistently outperformed Q-learning and DQN in overall profitability.
PPO showed superior performance across various evaluated metrics and stock selections.
Findings indicate limited market structures may affect the generalizability of results.

Abstract

This study presents a comparative evaluation using risk-adjusted metrics of performances among various reinforcement learning algorithms – Q-learning, Deep Q-Networks (DQN), and Proximal Policy Optimization (PPO) (Schulman et al.) models. Stock price data for market structures including AAPL, MSFT, and SPY covering the time period from January 1, 2012 until June 1, 2014 were used in the study, while each algorithm had been comprehensively trained, taking into account of various evaluation metrics including Sharpe and Sortino ratios, maximum drawdown, total return, and number of trades. The results had indicated that the Proximal Policy Optimization (PPO) agent had outperformed the two supplemental algorithms in terms of overall profitability. Across a majority of the evaluated metrics and stocks, the Proximal Policy Optimization (PPO) agent consistently outperformed both the DQN and Q-learning regarding general profitability. However, there is a narrow range of market structures alongside stocks within the program, possibly inhibiting more concrete results.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ayaan Prasad

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Performance-Based Assessment: Reinforcement Learning Models and zStock Market Volatility

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study