November 1, 2018Open Access

Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger

Key Points

Key points are not available for this paper at this time.

Abstract

Researches on Indonesian named entity (NE)tagger have been conducted since years ago but without using deep learning. Most researches employed traditional machine learning algorithms such as association rule, support vector machine, random forest, naïve bayes, etc. In those researches, the word lists as gazetteers or clue words are provided to enhance the accuracy. Here, we attempt to employ deep learning in our Indonesian NE tagger. We use long short-term memory (LSTM)as the topology since it is the state-of-the-art of NE tagger. By using LSTM, we don't need a word list in order to enhance the accuracy. Basically, there are two main things that we investigate. First is the output layers of the network: Softmax vs conditional random field (CRF). Second is the usage of part of speech (POS)tag embedding input layer. Using 8400 sentences as the training data and 97 sentences as the evaluation data, we found that POS tag embedding as the input layer improved the performance of our Indonesian NE tagger. As for the comparison between Softmax and CRF, we found that both architectures have a weakness in classifying an NE tag.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Devin Hoesen

Ayu Purwarianti

Actions

Institutions

Bandung Institute of Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study