Probability and Variance Score: an Efficient Supervised Feature Selection Method for Text Classification

Heyong Wang, Ming Hong

Abstract


This paper proposes a new supervised feature selection method termed probability and variance score () for text classification.  aims to improve variance score (), a simple unsupervised feature selection method because  only evaluates quantity of information of terms but it is not able to evaluate relationships between terms and classes of text documents.  not only evaluates quantity of information of terms by measuring variances of terms, the same method as , but also considers relationships between terms and classes by measuring posterior probabilities that terms occur given classes based on Bayesian Theory. Terms selected by  tends to be information-rich and highly class-related. Experimental results on three datasets indicate that  is efficient in selecting discriminative terms and outperforms .


Full Text:

PDF

Refbacks

  • There are currently no refbacks.