Abstract:
Aiming at the big amount of advertising messages and other noise tweets, the paper proposed a user classification filtering mechanism based on C4.5 Decision Tree Classification Algorithm and a scoring filtering method based on characteristic value. Taking advantage of the instantaneity of micro-blog text and timeliness of micro-blog topic, the paper put forward a similarity calculation method based on time parameter. Experiments showed that this mechanism could detect topics and filter noise with better accuracy and efficiency compared to the traditional approach.