In the text representation, although the BERT model can solve problem of the multivocal word in word2vec, the masking training of single word in BERT pre-training task separates the correlation between continuous words, which makes it difficult for the model to effectively learn the semantic information of words.Therefore, a classification method of defect text based on improved BERT model for feature representation is proposed by this paper.Firstly, the input layer of the BERT model is improved by linking and extracting entities, fusing token and entities for merging features and then conducting the pre-training task.Then, the improved BERT model is used to train and generate dynamic word vector to learn the semantic information with entity knowledge.Experiment results demonstrate that the defect text representation vectors trained by the proposed BERT model can enhance the accuracy for defect text classification.
Wang et al. (Thu,) studied this question.