Research on Domain Term Dictionary Construction Based on Chinese Wikipedia

Yu-wen ZHANG, Bao-an LI, Xue-qiang LV, Ning SUN, Jing-Jing TIAN

Abstract


Domain terms are words or phrases that represent concepts or relationships in a specific domain. It can represent the characteristics of corresponding domains. The automatic construction of domain-specific dictionary is an important task in natural language processing, which can be adopted in domain-specific ontology construction, vertical search, text classification, information retrieval, question answering system etc. In this paper, we propose a novel method for constructing domain term dictionary based on Chinese Wikipedia web resource and deep learning technology. We for first time explore to word representation by Word2vec model integrating Wikipedia link structure. Then we use word clustering algorithm and seed word extraction method to construct an original domain dictionary. Moreover, neural network method is applied to extend domain dictionary. In experiments, different methods were employed to extract the domain-specific terms and their performances were compared in automobile field, the results reveal the effectiveness of our method for construction of domain-specific dictionary.

Keywords


Automatic domain-specific term extraction, Domain thesaurus construction, Wiki, New word discovery


DOI
10.12783/dtcse/ammms2018/27260

Refbacks

  • There are currently no refbacks.