DHCC: An Efficient Algorithm for Supervised Discretization
Abstract
To improve the speed and effectiveness of data mining in equipment simulation training system, a discretization algorithm based on hierarchy clustering and compatibility (DHCC) is proposed. Compared with the traditional discretization algorithms, DHCC algorithm calculates the positive domain of clusters to adjust the number of clusters and realize the initial division of each attribute by combining the association between attributes. Further on the basis of the initial discretization results generated by hierarchy clustering, information entropy and simplified compatibility degree are calculated to merge the adjacent intervals to reduce the number of broken points and eliminate superfluous intervals. Therefore the valid and brief discretization scheme is generated. Through six typical datasets tests, the results show that DHCC algorithm is superior to Equal-W, Equal-F, Chimerge, MDLP, and CAIM algorithm in the total number of intervals and accuracy.
Keywords
Discretization, Hierarchy clustering, Information entropy, Compatibility
DOI
10.12783/dtcse/msam2020/34259
10.12783/dtcse/msam2020/34259
Refbacks
- There are currently no refbacks.