Ran into this question about diffing blocks of text on StackOverflow yesterday after KDiff3 and WinMerge both went crazy trying to diff a file where I'd simply mostly just grouped and, therefore, rearranged lots of methods. Seems like an easy issue, but as that question points out...

Is there a diff-like algorithm that handles moving block of lines? - Stack Overflow:

But it falls down when blocks of text are moved within the file.

Suppose you have the following two files, a.txt and b.txt (imagine that they're both hundreds of lines long rather than just 6):

a.txt   b.txt
-----   -----
1       4
2       5
3       6
4       1
5       2
6       3

diff a.txt b.txt shows this:

$ diff a.txt b.txt 
1,3d0
< 1
< 2
< 3
6a4,6
> 1
> 2
> 3

That really is painful, when it should be a reasonably easy process.

Now I've tried to write my own diff engine before in my usual bullheaded, straight-ahead style, not worrying about efficiency until after something's working. It's not easy. But what you can say is that if you take it as your primary mission to find block movements, it's a lot easier. Enter wikEd diff Online Tool - Cacycle, "The Only JavaScript Diff Library for Visual Inline Text Comparisons With Block Move Highlighting and Character/Word-Based Resolution".

Results are pretty good, both for the simplest case from the SO question, to real-world code.

wikEd example using example from SO question

(The green highlight is for grouping a block, but by default it ignores/doesn't highlight any moved blocks, which is nice when you're diffing code like I mentioned before...)

Now I have to resist the desire to put this into a full-fledged UWP app whose goal is to be a diff tool. There are smarter things to write on my own time. Please realize this, self.

Labels: , ,