Similarity Code File Detection Model Based on Frequent Itemsets

Jian-hong JIANG, Ke WANG

Abstract


In order to improve the efficiency and accuracy of program source code similarity detection, an improvement on the method of code detection is made according to some deficiencies of the current research. A similar code detection model based on frequent item sets is proposed. The model constructs frequent items set data to discover repetitive code collections and automatically divide file similarity attribution. The algorithm model does not need to consider the type of the code in the detection process, and has wide applicability, not only can detect the code files of different programming languages and grammars, but also can mark out similar codes and statistic the results. Simultaneously, through experimental comparison, it is proved that the model has high accuracy and processing efficiency.

Keywords


Source code similarity, Frequent itemset, Association rule


DOI
10.12783/dtcse/CCNT2018/24709

Refbacks

  • There are currently no refbacks.