This paper investigates the application of model-free reinforcement learning algorithms to optimize an AI agent for the card game 'Getaway'. Specifically, we focus on two temporal difference learning methods: Q-learning for off-policy learning and SARSA for on-policy learning. We compare the policies derived from both algorithms, demonstrating that the Q-learning approach is more suited to the strategic requirements of 'Getaway'. Additionally, we introduce a novel adaptation to Q-learning that involves seeding initial Q-values for each state-action pair following extensive exploration. Our analysis details how this modification enhances the performance of the model, offering insights into its potential for broader applications in complex game environments.
Saravanan Gowthaman (Sat,) studied this question.