Research and Application of Improved K-means Algorithm in Text Clustering

Shen-yi QIAN; Hui-hui LIU; Dai-yi LI

doi:10.12783/dtcse/pcmm2018/23653

Research and Application of Improved K-means Algorithm in Text Clustering

Shen-yi QIAN, Hui-hui LIU, Dai-yi LI

Abstract

K-means is a commonly used text clustering algorithm, the biggest advantage of the proposed algorithm is simple and fast, but due to the random selection of the initial cluster center point, the K-means algorithm is easy to fall into the local optimal algorithm and instability of the clustering results and the number of iterations. To solve this problem, this paper selected the initial cluster centers using hierarchical agglomerative clustering algorithm, to ensure the high quality of the center point; using cosine similarity to measure the distance between the text; reconstructed calculation formula of cluster center and the objective function of clustering quality. The experimental results show that the improved K-means algorithm has a relatively high accuracy and stability with the Sogou Chinese text corpus as the data set. Introduction

Keywords

K-means clustering algorithm, Hierarchical clustering algorithm, Text distance, Objective function, F measure

DOI
10.12783/dtcse/pcmm2018/23653

Refbacks

There are currently no refbacks.

Username
Password
Remember me

COMPUTER SCIENCEand ENGINEERING

Research and Application of Improved K-means Algorithm in Text Clustering

Abstract

Keywords

Refbacks

COMPUTER SCIENCE
and ENGINEERING