Research and Implementation of Key Technology of Distributed Big Data Collection
Abstract
With the arrival of big data age, for big data mining and analysis has become a hot research today. The data set is the basis for big data mining and analysis. Therefore, an effective data collection scheme is of great significance to the study of big data mining. An efficient distributed big data collection system is proposed. In this paper, a general and effective text extraction algorithm based on the weight of the label tree node is proposed in the analytic module of the system. At the same time the introduction of IP proxy pool technology to ensure the continuity of the system. Experiments show that the system can efficiently and quickly obtain a large amount of network data, and has strong robustness, feasibility and flexibility.
Keywords
Big data, Data collection, Text extraction, IP proxy pool
DOI
10.12783/dtetr/tmcm2017/12643
10.12783/dtetr/tmcm2017/12643
Refbacks
- There are currently no refbacks.