Research on Domain Term Dictionary Construction Based on Chinese Wikipedia
Abstract
Domain terms are words or phrases that represent concepts or relationships in a specific domain. It can represent the characteristics of corresponding domains. The automatic construction of domain-specific dictionary is an important task in natural language processing, which can be adopted in domain-specific ontology construction, vertical search, text classification, information retrieval, question answering system etc. In this paper, we propose a novel method for constructing domain term dictionary based on Chinese Wikipedia web resource and deep learning technology. We for first time explore to word representation by Word2vec model integrating Wikipedia link structure. Then we use word clustering algorithm and seed word extraction method to construct an original domain dictionary. Moreover, neural network method is applied to extend domain dictionary. In experiments, different methods were employed to extract the domain-specific terms and their performances were compared in automobile field, the results reveal the effectiveness of our method for construction of domain-specific dictionary.
Keywords
Automatic domain-specific term extraction, Domain thesaurus construction, Wiki, New word discovery
DOI
10.12783/dtcse/ammms2018/27260
10.12783/dtcse/ammms2018/27260
Refbacks
- There are currently no refbacks.