The Research of a Spider Based on Crawling Algorithm

Xin-Yang WANG; Jian ZHANG

doi:10.12783/dtcse/aice-ncs2016/5717

The Research of a Spider Based on Crawling Algorithm

Xin-Yang WANG, Jian ZHANG

Abstract

This paper conducts a deep research on data mining in three areas including work flow, key technologies and software algorithm of the spider. The paper analyzes the work flow and key technologies of the spider facing URL in details. It also brings forward the mind that adopting several queues to manage the URL list, in order to download HTML, files in high speed we sort the URLs by document correlativity. The aim of this paper is to design a well-adjusted and perfectly functional software model of the spider. Sun JDK+Borland Jbuilder+SQL Server+IIS+Bot package is used as the software development environment support.

Keywords

Spider, URL Seed, Scope First, Document Correlativity, Threshold

DOI
10.12783/dtcse/aice-ncs2016/5717

Refbacks

There are currently no refbacks.

Username
Password
Remember me

COMPUTER SCIENCEand ENGINEERING

The Research of a Spider Based on Crawling Algorithm

Abstract

Keywords

Refbacks

COMPUTER SCIENCE
and ENGINEERING