One Method of Keyword Extraction for Tibetan News Webpage for Tibetan News Webpage
Abstract
Keywords of Tibetan text is important in the area of text text clustering/categorization, automatic abstracting, IR and so on. However, there are no keywords in the Tibtan news WebPages. Besides, many algorithm for keywords extraction need the manually annotated corpus, so it is poor augment ability. Because Keywords can be considered as a set of words which are important and subject correlated cohesively in a document, this paper improved the CHI-Squared Statistic, use the idea of recommendation to extact keywords. Experiments from Tibtan news webpages demonstrate that this method is better than the method of TFIDF integrating with location information.
Keywords
Tibetan information processing, CHI-Squared Statistic, keywords extraction
DOI
10.12783/dtcse/iceiti2017/18815
10.12783/dtcse/iceiti2017/18815
Refbacks
- There are currently no refbacks.