Abstract—The study for a Named Entity Recognizer for Filipino Text Using Conditional Random Field (NERF-CRF) focused creating a system which identifies and classifies named entities present in a given corpus. The named entities were classified into four, namely: person, place, date and org. Named entities that are identified but do not fall in the four classifications are tagged as etc. Different modules were created to achieve the study's purpose, including a tokenizer and a part-of-speech tagger. The conditional random field approach was used in the classification of identified named entities. Filipino biographies were the corpus used in testing the system. The results, based on solving for the F-measure, indicate that the system is 83% accurate, and best in identifying named entity Date with 0% error rate but is unsatisfactory in distinguishing named entity place and org, with 42% and 33% error rates respectively.
Index Terms—Conditional random field, extraction, named entity recognition, natural language processing
R. A. Sagum and I.V.R. Domingo are with the Faculty of the College of Computer Management and Information Technology of the Polytechnic University of the Philippines (e-mail:riasagum31@yahoo.com; dvrdomingo@yahoo.com)
A. P. T. Alfonso, M. J. F. Galope, R. B. Villar, and J. T. Villegas are with the Polytechnic University of the Philippines (e-mail: anapat1219@yahoo.com; mharyjoy_galope
[PDF]
Cite: Ana Patricia T. Alfonso, Illuminada Vivien R. Domingo, Mary Joy F. Galope, Ria A. Sagum, Rachelle B. Villar, and Jobert T. Villegas, "Named Entity Recognizer for Filipino Text Using Conditional Random Field," International Journal of Future Computer and Communication vol. 2, no. 5, pp. 376-379, 2013.