What question did this study set out to answer?

The aim is to develop a privacy-preserving method for detecting malicious domains using DNS data.

March 25, 2026Open Access

Federated Learning for Malicious Domain Detection via Privacy-Preserving DNS Traffic Analysis

Key Points

The aim is to develop a privacy-preserving method for detecting malicious domains using DNS data.
Developed a federated learning pipeline for local training at clients.
Implemented secure aggregation to ensure privacy during model updates.
Combined DNS-specific and behavioral features for detection.
Benchmarked various federated learning algorithms for performance.
Federated models achieve near-centralized training effectiveness.
Outperformed local-only models in accuracy and convergence speed.
FedProx achieved ≥0.995 accuracy in fewer communication rounds compared to FedAvg.
Significant improvements shown via metrics including ROC-AUC and PR-AUC.

Abstract

Malicious domain detection (MDD) from DNS telemetry enables early threat hunting but is constrained by privacy and data-sharing barriers across organizations. We present a deployable federated learning (FL) pipeline that trains a compact deep neural network (DNN; 64-32-16 with ReLU and dropout 0.3) locally at each client and exchanges only masked model updates. Privacy is enforced via secure aggregation (the server observes only an aggregate of masked updates) and optional server-side differential privacy (DP) via clipping and Gaussian noise. Our feature schema combines DNS-specific lexical cues (character n-grams, entropy, TLD indicators) with lightweight behavioral signals (TTL dispersion, query cadence) without exporting raw logs or identifiers. We benchmark FedAvg, FedProx, and FedNova under controlled non-IID client partitions and report ROC-AUC, precision-recall area under the curve (PR-AUC), F1, convergence speed, and communication cost. Federated models approach centralized training while outperforming local-only baselines; FedProx reaches the target Accuracy ≥0.995 in fewer rounds than FedAvg under medium heterogeneity. We report 95% bootstrap confidence intervals and paired significance tests (DeLong for ROC-AUC; McNemar for Accuracy). Overall, privacy-preserving FL for DNS-based MDD is practical, providing near-centralized utility while keeping DNS data local.

Bookmark

View Full Paper