Abstract—Mutual information (MI) has been extensively used to measure the co-occurrence strength between two words in the field of natural language processing. Similarly, the word context entropy is also a useful measure to determine the distribution of words in contexts, and can be used to calculate word similarity. Calculating scores for both measures usually relies on a large text corpus to obtain a reliable estimation. However, calculation based on a static corpus may not reflect the dynamic nature of languages. In this paper, we consider the web documents as a text corpus, and develop an efficient online calculator for both mutual information and word context entropy. The major advantage of the online computation is that the web corpus not only is large enough to obtain a reliable estimation but also can reflect the dynamic nature of languages.
Index Terms—Mutual information, word context, entropy, natural language processing.
The authors are with Department of Information Management, Yuan Ze University, Chung-Li, Taiwan, R.O.C (Corresponding author: Tel.: + 886-3-463-8800, fax: + 886-3-435-2077, e-mail address: lcyu@saturn.yzu.edu.tw).
Cite: Wei-Hsuan Lin, Yi-Lun Wu, and Liang-Chih Yu, "Online Computation of Mutual Information and Word Context Entropy," International Journal of Future Computer and Communication vol. 1, no. 2, pp. 167-169, 2012.
Copyright © 2008-2024. International Journal of Future Computer and Communication. All rights reserved.
E-mail: ijfcc@ejournal.net