Abstract—We proposed the methods to classify the text-based chief complaint in Thai language, our native language, into the symptom code based on ICD-10. Using Thai sign and symptom descriptions from ICD-10 document is the training data to build Thai text-based corpus in domain of sign and symptom. Then the corpus has been used for tokenization of Thai text-based chief complaint (ThCC) into a particular word by using the longest matching technique and our proposed technique named two-level tokenization technique. The tokens from two techniques are evaluated by five different classifiers including decision tree classifier, K-mean neighbours classifier, radius neighbours classifier, random forest classifier, and extremely randomized tree classifier. The experimental result shows 85% accuracy for assigning ICD-10 code to Thai text-based chief complaint by using our proposed technique with decision tree classifier.
Index Terms—Classification, Thai language processing, chief complaint identification, machine learning.
The authors are with the Department of Computer Science, Prince of Songkla University, Hat Yai, Songkhla, Thailand (e-mail: jarunee.d@psu.ac.th, tamakosan14@gmail.com).
[PDF]
Cite: Jarunee Duangsuwan and Pawin Saeku, "Semi-automatic Classification Based on ICD Code for Thai Text-Based Chief Complaint by Machine Learning Techniques," International Journal of Future Computer and Communication vol. 7, no.2, pp. 37-41, 2018.