Sentence Similarity Calculating Method Based on Word2Vec and Clustering

WAN-LI SONG

Abstract


With the rapid development and all-round popularization of Internet, more and more data is stored in the form of text in the network platform. The massive data makes the redundancy of text information. It is very important to use text similarity technology to remove duplicate data. Therefore, how to effectively improve the accuracy and precision of text similarity calculation is an urgent problem. In this paper, we propose an improved method to calculate sentence semantic similarity. This method uses word2vec model to get the semantic information of the text, uses k-means algorithm to cluster the above results, then uses word2vec model for retraining, and finally gets the sentence similarity. Experimental results indicate that the performance of our algorithm is better improved compared with the traditional word2vec algorithm

Keywords


Sentence similarity, Clustering, Word2Vec.Text


DOI
10.12783/dtetr/amee2019/33485

Refbacks

  • There are currently no refbacks.