What question did this study set out to answer?

The research aims to enhance federated learning frameworks to better handle heterogeneous data and model capacities in edge environments.

March 13, 2026Open Access

MFEL-H2B: Multimodal Federated Edge Learning with heterogeneity-aware balancing across modalities and models

Key Points

The research aims to enhance federated learning frameworks to better handle heterogeneous data and model capacities in edge environments.
Developed the MFEL framework for multimodal federated learning.
Introduced prototype networks for cross-client modality alignment.
Implemented rebalanced modality gradient modulation to address intra-client modality imbalance.
Utilized ensemble momentum-based knowledge distillation for efficient knowledge transfer among clients.
MFEL-H2B outperformed state-of-the-art methods in accuracy across various datasets.
Showed faster convergence speeds compared to existing federated learning frameworks.
Demonstrated improved training stability and generalization across diverse client architectures.

Abstract

The rapid growth of Edge Intelligence (EI) and heterogeneous user demands has led to the widespread generation of multimodal data at the network edge. Multimodal Federated Learning (MFL) provides a promising solution for collaborative, privacy-preserving model training across distributed clients. However, existing MFL frameworks often assume homogeneous environments and fail to account for disparities in client data distributions, modality characteristics, and computational resources, limiting their effectiveness in real-world edge deployments. To address these challenges, we propose Multimodal Federated Edge Learning (MFEL), a flexible framework that supports resource-adaptive deployment through variable-capacity submodels. Building upon MFEL, we introduce MFEL-H2B, a heterogeneous-aware approach that integrates three core mechanisms: (1) Prototype Networks for cross-client modality alignment, mitigating representation divergences caused by non-IID data and heterogeneous sensing conditions; (2) Rebalanced Modality Gradient Modulation (R-MGM), which adaptively amplifies gradients of underrepresented modalities while suppressing dominant ones to alleviate intra-client modality imbalance; and (3) Ensemble Momentum-based Knowledge Distillation (E-MKD), which constructs a dynamic ensemble teacher from client predictions and leverages a momentum mechanism to facilitate efficient and robust knowledge transfer among clients with heterogeneous model capacities. Extensive experiments on heterogeneous multimodal datasets demonstrate that MFEL-H2B consistently outperforms state-of-the-art baselines in accuracy, convergence speed, and training stability, while maintaining strong generalization across diverse client architectures and resource profiles.

MFEL-H2B: Multimodal Federated Edge Learning with heterogeneity-aware balancing across modalities and models

Key Points

Abstract

Cite This Study