Abstract Extracting Chinese Cyber Threat Intelligence (CTI) under increasingly complex advanced persistent threat scenarios is crucial, yet challenging due to domain-specific term ambiguity and frequent long, nested entities. To address polysemy, nested-label conflicts, and cross-sentence semantic discontinuity, we propose an enhanced Transformer-based entity recognition method formulated as a pointer network. On the encoder side, we build a RoBERTa model with Rotary Positional Embeddings. To handle complex positions and boundaries of heterogeneous entity types, we introduce tokenization compensation and positional-parameter compression to sharpen boundary sensitivity. In the decoder, we refine GlobalPointer and model recognition as 2D head–tail span matching, enabling direct detection of overlapping and nested entities. To mitigate long-tail bias, we introduce an entity-frequency-aware dynamic threshold and a reweighted zero-boundary log-loss to improve recall for rare entities. Experiments demonstrate an overall F1 improvement of 6.32% over baselines on Chinese CTI datasets, with absolute gains reaching 19.7% specifically on nested and long entities. These results validate the model’s effectiveness in Chinese-specific named entity recognition and its utility for high-accuracy automated CTI analysis.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yongwei Wang
Jipeng Tang
Hao Hu
Cybersecurity
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69df2abce4eeef8a2a6afb72 — DOI: https://doi.org/10.1186/s42400-026-00588-1