March 3, 2026Open Access

Policy learning under constraint: Maximizing a primary outcome while controlling an adverse event

Key Points

PLUC effectively balances the optimization of treatment effects with the management of adverse events, improving patient outcomes.
The method demonstrates robust performance in numerical experiments, showing its efficacy across various treatment scenarios.
Using the Frank-Wolfe algorithm, PLUC iteratively updates treatment strategies while ensuring adverse event probabilities are controlled.
This approach highlights the importance of considering multiple outcomes in treatment recommendations, potentially advancing personalized medicine.

Abstract

A medical policy aims to support decision-making by mapping patient characteristics to individualized treatment recommendations. Standard approaches typically optimize a single outcome criterion. For example, recommending treatment according to the sign of the Conditional Average Treatment Effect (CATE) maximizes the policy "value" by exploiting treatment effect heterogeneity. This point of view shifts policy learning towards the challenge of learning a reliable CATE estimator. However, in multi-outcome settings, such strategies ignore the risk of adverse events, despite their relevance. PLUC (Policy Learning Under Constraint) addresses this challenges by learning an estimator of the CATE that yields smoothed policies controlling the probability of an adverse event in observational settings. Inspired by insights from EP-learning, PLUC involves the optimization of strongly convex Lagrangian criteria over a convex hull of functions. Its alternating procedure iteratively applies the Frank-Wolfe algorithm to minimize the current criterion, then performs a targeting step that updates the criterion so that its evaluations at previously visited landmarks become targeted estimators of the corresponding theoretical quantities. An R package PLUC-R provides a practical implementation. We illustrate PLUC's performance through a series of numerical experiments.

Bookmark

Policy learning under constraint: Maximizing a primary outcome while controlling an adverse event

Key Points

Abstract

Cite This Study