Similarity Code File Detection Model Based on Frequent Itemsets
Abstract
In order to improve the efficiency and accuracy of program source code similarity detection, an improvement on the method of code detection is made according to some deficiencies of the current research. A similar code detection model based on frequent item sets is proposed. The model constructs frequent items set data to discover repetitive code collections and automatically divide file similarity attribution. The algorithm model does not need to consider the type of the code in the detection process, and has wide applicability, not only can detect the code files of different programming languages and grammars, but also can mark out similar codes and statistic the results. Simultaneously, through experimental comparison, it is proved that the model has high accuracy and processing efficiency.
Keywords
Source code similarity, Frequent itemset, Association rule
DOI
10.12783/dtcse/CCNT2018/24709
10.12783/dtcse/CCNT2018/24709
Refbacks
- There are currently no refbacks.