Design and Implementation of Parallelized LDA Topic Model Based on MapReduce
Abstract
In order to solve theefficiency bottlenecksof non-parallel LDA topic model while processing large-scale text datasets,a parallel LDA topic model computing framework based on MapReduce is designed and implemented. Performance testing of the parallel LDA topic modelis also conducted by using the bibliographic sample data of articles and patents.Experiment shows that, the parallel LDA topic analysis process based on MapReduce framework is feasible.Compared with non-parallel LDA model, the parallel LDA topic model process can obviously improve the analysis efficiency forlarge-scale text datasets.
Keywords
MapReduce framework, LDA topic model, Gibbs sampling, Parallelcomputing
DOI
10.12783/dtcse/CCNT2018/24712
10.12783/dtcse/CCNT2018/24712
Refbacks
- There are currently no refbacks.