Abstract—Recent research reveals that the number of cyber-attacks has been doubled in the past three years. This is a devastating growth of the number of cyber-attacks, and it reveals a serious business problem around the world. Existing intrusion detection systems (IDSs), intrusion prevention systems (IPSs), and anti-malware systems mainly rely on low-level network traffic features or program code signatures to detect cyber-attacks. However, since hackers can constantly change their attack tactics by, it is extremely difficult for existing security solutions to detect cyber-attacks. There are increasing more evidences showing that cybercriminals tend to exchange cybercrime knowledge and transact via online social media. Accordingly, it presents unprecedented opportunities for security intelligence experts to tap into online social media to extract the vital security intelligence for cyber-attack forensics. The main contributions of this paper are the design, development, and evaluation of a Latent Dirichlet Allocation (LDA)-based latent text mining model for cyber-attack forensics. Our preliminary evaluation of the proposed latent text mining model based on a real-world data set crawled from Twitter and Blog sites shows that it significantly outperforms the probabilistic latent semantic indexing (pLSI) method in terms of extracting more relevant and richer concepts describing real-world cyber-attack incidents.
Index Terms—Text mining, latent dirichlet allocation, cyber-attacks, cyber forensics.
Raymond Y. K. Lau is with the City University of Hong Kong, Tat Chee
Avenue, Kowloon Tong, Hong Kong SAR (e-mail: email@example.com).
Yunqing Xia is with Centre for Speech and Language Technologies, Tsinghua University, Beijing 100084, China (e-mail: firstname.lastname@example.org).
Cite: Raymond Y. K. Lau and Yunqing Xia, "Latent Text Mining for Cybercrime Forensics," International Journal of Future Computer and Communication vol. 2, no. 4 pp. 368-371, 2013.