Collection of Tibetan Network

Chang-zhi WANG; Guixian XU; Hui WANG

doi:10.12783/dtcse/cmsam2016/3628

Collection of Tibetan Network

Chang-zhi WANG, Guixian XU, Hui WANG

Abstract

With the development of Tibetan information technology, technologies about Tibetan web crawlers was extremely important. We elaborate different pages pretreatment rules according to the different sites and make the collected Tibetan Web text dump for Tibetan documents, by constructing a Web crawler to crawl different Tibetan websites, Experiments show that it can quickly and effectively to build large-scale Tibetan corpus, build the foundations for Tibetan information processing technology by self-made software and the module of pretreatment.

Keywords

Web crawler, Pretreatment, Tibetan corpus

Publication Date

2016-11-17 00:00:00

DOI
10.12783/dtcse/cmsam2016/3628

Refbacks

There are currently no refbacks.

Username
Password
Remember me

COMPUTER SCIENCEand ENGINEERING

Collection of Tibetan Network

Abstract

Keywords

Publication Date

DOI

Refbacks

COMPUTER SCIENCE
and ENGINEERING