technology from back to front

Diff for Javascript

Toward the end of last week, I found myself wondering about implementing a diff algorithm in Javascript. It turns out there’s already at least one available attempt at this, by John Resig (direct link: jsdiff.js). Unfortunately, the implementation is a little buggy (try diffString("the quick", "the quick fox"), for example), so I went hunting for source material to implement my own.

I ended up finding J. W. Hunt and M. D. McIlroy, “An algorithm for differential file comparison”, Bell Telephone Laboratories CSTR #41 (1976) (postscript), which turned out to be quite implementable.

Here’s my implementation: hunt-mcilroy.js. With luck, I’ve even implemented something close to the described algorithm.

As an example, the output from

diff(“the quick brown fox jumped over”.split(/\s+/),
“the quick fox jumps over”.split(/\s+/))

is

[{common:["the","quick"]},
{file1:["brown"],
file2:[]},
{common:["fox"]},
{file1:["jumped"],
file2:["jumps"]},
{common:["over"]}]

The next step is to implement a diff3 equivalent, and then I’ve the tools to implement rudimentary version-control. It’d sure be a fine thing to see TiddlyWiki evolve darcs-like distributed change-propagation…

by
tonyg
on
15/08/06
  1. How is your diff3 going on ?
    If it finish , may I see it’s source code by JS .

    Thanks alot

  2. Hi Frank, unfortunately I’ve not yet returned to this little project. I’ll be sure to post when I do, though.

  3. OK , perhaps i’ll try to do a new one by myself . thanx

  4. Peter Pallos
    on 19/02/08 at 11:14 pm

    The code fails with the two sentences that I found on a different site:

    str1 = “The red brown fox jumped over the rolling log.”;
    str2 = “The brown spotted fox leaped over the rolling log”;
    alert(uneval(diff(str1, str2)));

    For example, a part of the result array is:
    {common:["o", "v"]}, {file1:["e"], file2:["e"]}, {common:["r"]}

  5. You’re right, Peter – computing a diff character-by-character seems to give spurious diffs! How odd. Have you dug into the code to find out why that might be?

    Splitting the string into words first gives a sensible answer:

    js> uneval(diff(str1.split(/\s+/), str2.split(/\s+/)));

    [{common:["The"]},
    {file1:["red"], file2:[]},
    {common:["brown"]},
    {file1:[], file2:["spotted"]},
    {common:["fox"]},
    {file1:["jumped"], file2:["leaped"]},
    {common:["over", "the", "rolling"]},
    {file1:["log."], file2:["log"]}]

  6. sandraros
    on 01/08/09 at 6:01 pm

    … by John Resig (direct link: jsdiff.js). Unfortunately, the implementation is a little buggy (try diffString(“the quick”, “the quick fox”) …

    It has been corrected (tested today)

  7. [quote]
    … by John Resig (direct link: jsdiff.js). Unfortunately, the implementation is a little buggy (try diffString(”the quick”, “the quick fox”) …

    It has been corrected (tested today)
    [/quote]

    Does anyone know where the latest copy of the John Resig code can be found? His Project page for diffing has been taken down.

  8. I’ve just found and fixed the bug Peter Pallos noticed way back in 2008! The fix is visible here.

 
 


eight × 2 =

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us