We present a novel approach for learning from label proportions, which consists in a two-stage strategy: (i) estimating individual labels within sample bags; (ii) training a classifier using estimated labels in either a supervised or semisupervised manner. We recast the first stage into the optimal transport paradigm, and we leverage particle flow techniques to estimate the labels of the samples in the bags. More precisely, we define a proxy empirical distribution with known labels and a learnable support structure. We adjust the support of this distribution by minimising its Wasserstein distance to the empirical distributions of the bags, easing accurate label estimation for the samples. We provide a theoretical analysis of the particle flow update and outline the necessary conditions for our method to accurately estimate the labels. Our findings indicate that a critical factor influencing the learnability of this problem is the gap and diversity in label proportions between classes within the bags. We conduct numerical experiments on both tabular and image datasets, demonstrating the effectiveness of our proposed methodology.
Flamary et al. (Mon,) studied this question.