Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

On strategies for imbalanced text classification using SVM: A comparative study

Sun, Aixin, Lim, Ee-Peng and Liu, Ying 2009. On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems 48 (1) , pp. 191-201. 10.1016/j.dss.2009.07.011

Full text not available from this repository.

Abstract

Many real-world text classification tasks involve imbalanced training examples. The strategies proposed to address the imbalanced classification (e.g., resampling, instance weighting), however, have not been systematically evaluated in the text domain. In this paper, we conduct a comparative study on the effectiveness of these strategies in the context of imbalanced text classification using Support Vector Machines (SVM) classifier. SVM is the interest in this study for its good classification accuracy reported in many text classification tasks. We propose a taxonomy to organize all proposed strategies following the training and the test phases in text classification tasks. Based on the taxonomy, we survey the methods proposed to address the imbalanced classification. Among them, 10 commonly-used methods were evaluated in our experiments on three benchmark datasets, i.e., Reuters-21578, 20-Newsgroups, and WebKB. Using the area under the Precision–Recall Curve as the performance measure, our experimental results showed that the best decision surface was often learned by the standard SVM, not coupled with any of the proposed strategies. We believe such a negative finding will benefit both researchers and application developers in the area by focusing more on thresholding strategies.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Engineering
Subjects: T Technology > TA Engineering (General). Civil engineering (General)
Uncontrolled Keywords: Imbalanced text classification; Support Vector Machines; SVM; Resampling; Instance weighting
Publisher: Elsevier
ISSN: 0167-9236
Last Modified: 04 Jun 2017 05:20
URI: http://orca.cf.ac.uk/id/eprint/51001

Citation Data

Cited 110 times in Google Scholar. View in Google Scholar

Cited 127 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item