What question did this study set out to answer?

This research aims to enhance practical machine learning applications by improving data privacy and model reliability.

April 17, 2026Open Access

Towards practical and trustworthy machine learning: Federated clustering and LLM reasoning

Key Points

This research aims to enhance practical machine learning applications by improving data privacy and model reliability.
Introduced Federated Centroid Aggregation (FeCA) algorithm for federated clustering.
Applied federated clustering to medical data for disease subtyping.
Conducted a survey on uncertainty estimation methods for LLMs.
Developed structured reasoning frameworks for LLMs in robotic grasping.
FeCA demonstrated robust performance across various federated settings with decentralized data.
Identified four distinct Parkinson’s disease subtypes from distributed patient records.
Survey revealed different methods of estimating LLM uncertainty and their effectiveness.
LLM reasoning approaches improved reliability in generating tasks like robotic grasping.

Abstract

As machine learning becomes increasingly deployed in real-world applications, challenges such as data privacy and reliability of model predictions persist. We address these concerns through two key research directions: federated clustering for privacy-preserving learning and large language model (LLM) reasoning for trustworthy responses. Our work bridges the gap between recent machine learning advances and their reliable deployment in practical applications. The first part focuses on federated clustering, where data is decentralized across multiple client devices under strict privacy constraints. Traditional clustering algorithms struggle in federated settings due to data heterogeneity and the nonconvex nature of clustering objectives. To address these challenges, we introduce Federated Centroid Aggregation (FeCA), a one-shot algorithm that leverages the structural properties of local solutions in k-means clustering. FeCA adaptively refines local solutions on each client and then aggregates them into a global solution. Our theoretical analysis and experiments, spanning synthetic datasets to real-world image datasets, demonstrate that our algorithm maintains robust performance in various federated settings. Furthermore, we extend this approach to representation learning with DeepFeCA, integrating clustering-based feature learning into our federated framework, enabling effective unsupervised learning from decentralized data. Building on these advances, we apply federated clustering to medical data analysis, where patient records are distributed across healthcare institutions under strict privacy regulations. We propose a federated disease subtyping framework that enables collaborative multi-cohort analysis without sharing patient-level data. Applying this framework to clinical data from three AMP-PD cohorts, we identify four clinically interpretable Parkinson’s disease subtypes across distributed cohorts. The identified subtypes exhibit distinct profiles spanning cognitive function, motor severity, non-motor symptoms, and functional disability, validating that federated clustering can recover meaningful disease subtypes from decentralized data and supporting privacy-preserving multi-institutional disease subtype discovery. The second part focuses on trustworthiness in LLMs, particularly when models generate responses that appear confident despite uncertainty or ambiguity. To tackle this challenge, we first conduct a survey of LLM uncertainty estimation methods, which quantify confidence in model outputs. Our survey classifies existing approaches into four categories: (1) verbalizing methods, which prompt LLMs to self-report confidence scores but often suffer from overconfidence; (2) latent information methods, which extract uncertainty from internal probability distributions, such as predicted token probabilities; (3) consistency-based methods, which assess uncertainty by measuring response stability under input perturbations like paraphrasing; (4) semantic clustering methods, which evaluate uncertainty by grouping semantically similar responses. We evaluate these methods on multiple datasets, distinguishing their effectiveness in capturing aleatoric uncertainty (arising from ambiguous or incomplete information) and epistemic uncertainty (arising from the model’s knowledge limitations). These insights improve understanding of LLM uncertainty and guide more trustworthy applications in practice. To further enhance LLM trustworthiness, we explore their reasoning capabilities in practical applications beyond language tasks, particularly in robotic grasping. We develop frameworks that incorporate a structured reasoning phase during training, enabling LLMs to generate more reliable grasp poses from ambiguous human instructions. These approaches demonstrate practical settings in which LLM reasoning can be effectively harnessed for real-world applications and can improve downstream reliability through structured reasoning processes.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jinxuan Xu

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Towards practical and trustworthy machine learning: Federated clustering and LLM reasoning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider