March 3, 2026

Particle Flow for Learning from Label Proportions

Key Points

The proposed method effectively improves label estimation by minimizing Wasserstein distance between distributions, enhancing accuracy.
Numerical experiments on tabular and image datasets validate the particle flow methodology's robustness and applicability.
Using a two-stage strategy, individual labels are estimated first, followed by training a classifier in supervised or semi-supervised manners.
The study highlights label proportion diversity as a critical factor affecting the learnability of classification problems.

Abstract

We present a novel approach for learning from label proportions, which consists in a two-stage strategy: (i) estimating individual labels within sample bags; (ii) training a classifier using estimated labels in either a supervised or semisupervised manner. We recast the first stage into the optimal transport paradigm, and we leverage particle flow techniques to estimate the labels of the samples in the bags. More precisely, we define a proxy empirical distribution with known labels and a learnable support structure. We adjust the support of this distribution by minimising its Wasserstein distance to the empirical distributions of the bags, easing accurate label estimation for the samples. We provide a theoretical analysis of the particle flow update and outline the necessary conditions for our method to accurately estimate the labels. Our findings indicate that a critical factor influencing the learnability of this problem is the gap and diversity in label proportions between classes within the bags. We conduct numerical experiments on both tabular and image datasets, demonstrating the effectiveness of our proposed methodology.

Bookmark

Particle Flow for Learning from Label Proportions

Key Points

Abstract

Cite This Study