第 卷 第 期 集美大学学报 (自然科学版)
25 5 1 2
CHEN Deyi , ZHANG Hongyi , LIU Cailing , ZHANG Guangbin
( 1. College of Optoelectronics and Communication Engineering, Xiamen University of Technology, Xiamen 361024, China;
2. Xiamen Meiya Pico Information Co. , Ltd. , Xiamen 361005, China)
Abstract:
The rapid development of internet and big data technology has greatly facilitated people's access
to various Chinese text information, but also greatly increased the risk of dissemination of harmful information
in Chinese text. The traditional text processing method based on vector representation is mainly used to process
English text. To deal with these problems, a novel Chinese text classification framework was proposed. In this
framework, a word vector model based on Word2Vec was constructed firstly. Then the keywords with distinguis-
hing category ability were selected by using word document frequency ( segmentation term frequency-document
frequency, STF-DF) . Meanwhile, a suitable convolution neural network ( CNN) was build for Chinese text clas-
sification. The experimental results show that the accuracy of this framework in THUCNews and Fudan Univer-
sity Chinese text data set is 94. 51% and 95. 04% respectively, an
N的中文文本有害信息分类 来自淘豆网m.daumloan.com转载请标明出处.