Let_Me_Be Let_Me_Be -4 years ago 98
C Question

Similar code detector

I'm search for a tool that could compare source codes for similarity.

We have a very trivial system right now that has huge amount of false positives and the real positives can easily get buried in them.

My requirements are:


  • reasonably small amount of false positives

  • good detection rate (yeah these are going against each other)

  • ideally with a more complex output than just a single value

  • usable for C (C99) and C++ (C++03 and optimally C++11)

  • still maintained

  • usable for comparing two source files against each other

  • usable in non-interactive mode



EDIT:

To avoid confusion, the following two code snippets are identical and should be detected as such:

for (int i = 0; i < 10; i++) { bla; }


int i; while (i < 10) { bla; i++; }


The same here:

int x = 10; y = x + 5;


int a = 10; y = a + 5;

Answer Source

I've used MOSS in the past: http://theory.stanford.edu/~aiken/moss/ to detect plagiarized code. Since it works on a semantic level, it will detect the situations you presented above. The tool is language-aware, so comments are not considered in the analysis, and it goes a long way in detecting code that has been modified through simple search-and-replace of variable and/or function names.

Note: I used the tool a few years ago when I taught computer science in grad school, and it worked wonderfully in detecting code that had been yanked from the internet. Here is a well-documented account of similar application: http://fie2012.org/sites/fie2012.org/history/fie99/papers/1110.pdf

If you google "measure software similarity", you should find a few more useful hits: http://www.ics.heacademy.ac.uk/resources/assessment/plagiarism/detectiontools_sourcecode.html

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download