I recently found myself in a scenario, where I tried to figure out how implementation clusters have been implicitly created within a group of students. All of them were given a task (with 4 sub tasks) for a whole semester. Everyone was meant to do the task alone, but collaboration was allowed. However I needed to know who helped whom and - of course - who helped whom with source code.
A colleague had a similar problem and he pointed me to PMD CPD (= PMD Copy & Paste Detector) . This tool works lightning fast and has a GUI ðŸ™‚ Also its open source -> respect!