What question did this study set out to answer?

The aim is to develop a real-time fraud detection architecture for electronic payment systems addressing class imbalance and transaction velocity.

June 3, 2026Open Access

Detecting Fraudulent Transactions of Electronic Payment Cards Based on Stream Processing in Big Data

Key Points

The aim is to develop a real-time fraud detection architecture for electronic payment systems addressing class imbalance and transaction velocity.
Proposed a three-layer streaming architecture for fraud detection using unsupervised and supervised learning.
Utilized Hidden Markov Models and Self-Organizing Maps for initial fraud screening, and Neural Networks and Logistic Regression for classification.
Implemented the solution with Apache Spark Streaming for low-latency processing on a dataset of 30,000 transactions.
Achieved an F1-score of 0.912 and an AUC-ROC of 0.941 on the UCI Credit Card dataset, indicating high accuracy.
Displayed an average inference latency of 2.1 ms per transaction, outperforming traditional models.
Ablation studies confirmed that the integration of evidence fusion and threshold tuning was crucial for performance.

Abstract

Fraud detection in electronic payment card systems remains a critical challenge due to high transaction velocity, severe class imbalance, and the evolving nature of fraud behaviors. Although existing studies have improved detection performance using supervised or unsupervised techniques, many rely on single-paradigm models and offline evaluation, limiting their effectiveness in real-time payment environments. To address these limitations, this paper proposes a hybrid real-time fraud detection architecture designed as a three-layer streaming architecture that integrates unsupervised and supervised learning with evidence-based decision fusion. In the proposed architecture, Hidden Markov Models (HMM) and Self-Organizing Maps (SOM) constitute a fast behavioral screening layer for identifying previously unseen fraud patterns, while Neural Networks (NN) and Logistic Regression (LR) form an explicit classification layer for known frauds. The heterogeneous outputs of these models are aggregated using Dempster–Shafer theory, and transactions are classified through empirically tuned dual thresholds (θL and θU), enabling robust handling of uncertainty and severe class imbalance. The entire pipeline is implemented using Apache Spark Streaming and HDFS, allowing low-latency processing prior to transaction authorization. Experimental evaluation on the UCI Credit Card dataset (30,000 transactions, 22.12% fraudulent) demonstrates that the proposed architecture achieves an F1-score of 0.912, an AUC-ROC of 0.941, and an average inference latency of 2.1 ms per transaction, outperforming strong single-model baselines. Ablation studies further confirm that evidence fusion, numericalization, threshold tuning, and Spark-based streaming are essential components of the proposed architecture. These results indicate that the proposed model is accurate, scalable, and well suited for real-world, large-scale electronic payment fraud detection.

Detecting Fraudulent Transactions of Electronic Payment Cards Based on Stream Processing in Big Data

Key Points

Abstract

Cite This Study