Abstract—In this paper, we propose a method for extracting topics we were interested in over the course of the past 18 months from a closed-caption TV corpus. Each TV program is assigned one of the following genres: drama, informational or tabloid style program, music, movie, culture, news, variety, welfare, and sport. We focus on dramas and informational/tabloid style programs in this paper. As the results, we extracted some words or bigrams that formed part of a signature phrase of a heroine and the name of a hero in a popular drama.
Index Terms—Topic detection, spoken language corpus, closed caption TV data, word frequency, Pearson's r.
Hajime Mochizuki is with the Institute of Global Studies, Tokyo University of Foreign Studies, 3-11-1 Asahi, Fuchu, Tokyo, 183-8534, Japan (e-mail: motizuki@tufs.ac.jp).
Kohji Shibano is with the Research Institute for Languages and Cultures of Asia and Africa, Tokyo University of Foreign Studies, Japan (e-mail: shibano@aa.tufs.ac.jp).
[PDF]
Cite: Hajime Mochizuki and Kohji Shibano, "Re-mining Topics Popular in the Recent Past from a Large-Scale Closed Caption TV Corpus," International Journal of Future Computer and Communication vol. 4, no. 2, pp. 98-103, 2015.