Federated Learning (FL) offers a privacy-preserving alternative to traditional Machine Learning setups, where training data is shared with a central instance. In Centralized Federated Learning (CFL), participants share their model parameters / gradients with a central aggregation server which redistributes the newly aggregated global model to the participants. As it relies on a central server for model aggregation, it suffers from communication bottlenecks and single points of failure. Decentralized Federated Learning (DFL) addresses these issues by enabling direct peer-to-peer communication, meaning it eliminates single points of failure and communication bottlenecks. However, the practical adoption of DFL is often hindered by the complexity, high resource requirements, and poor developer experience associated with existing DFL platforms. Many solutions are difficult to deploy, extend, and debug, creating a significant barrier for researchers as well as practitioners. This thesis presents the design, implementation, and evaluation of a lightweight DFL Platform created to address these shortcomings. The core contribution is a hybrid DFL architecture that logically separates the orchestration of the scenario from the training, parameter exchange and aggregation. A central coordinator is used exclusively for lightweight tasks such as scenario orchestration, monitoring (collecting statistics, metrics and logs) and participant synchronization while the computationally intensive processes of model training and aggregation remain fully decentralized, occurring directly between peer-to-peer nodes. This design combines the scalability and fault tolerance of DFL with the manageability of a centralized system. The platform was engineered with a strong focus on modularity, extensibility, and usability. Key features include simplified deployment procedures that avoid complex containerization, an architecture that facilitates the use of standard debugging tools, and a basic web interface for real-time monitoring. The evaluation considering both qualitative aspects of the developer and user experience as well as quantitative measurements of orchestration overhead, confirms the effectiveness of the proposed approach. The results show that the platform significantly lowers deployment complexity while providing a user-friendly and highly extensible environment for conducting DFL experiments. Ultimately, this work delivers a practical solution that aims to make DFL more accessible and enables rapid, reliable and reproducible research.
Timothy-Till Näscher (Sun,) studied this question.