Tomáš Zato Tomáš Zato - 2 months ago 14
Python Question

How to understand/use the Python difflib output?

I am trying to make comprehensive diff that compares command line output of two programs. I used

difflib
and came up with this code:

from difflib import Differ
from pprint import pprint
import sys

def readable_whitespace(line):
return line.replace("\n", "\\n")

# Two strings are expected as input
def print_diff(text1, text2):
d = Differ()
text1 = text1.splitlines(True)
text2 = text2.splitlines(True)

text1 = [readable_whitespace(line) for line in text1]
text1 = [readable_whitespace(line) for line in text2]

result = list(d.compare(text1, text2))
sys.stdout.writelines(result)
sys.stdout.write("\n")


Some requirements I have:


  • (obvious) It should be clear what is from which output when there is a difference

  • New lines are replaced with
    \n
    because they matter in my case and must be clearly visible when causing conflict



I made a simple test for my diff function:

A = "AAABAAA\n"
A += "BBB\n"
B = "AAAAAAA\n"
B += "\n"
B += "BBB"
print_diff(A,B)


For your convenience, here is test merged with the function so that you can execute it as file: http://pastebin.com/BvQw9naa

I have no idea what is this output trying to say to me:

- AAAAAAA\n? ^^
+ AAAAAAA
? ^
- \n+
BBB


Notice those two
^
symbols on first line? What are they pointing to...? Also, I intentionally put trailing new line into one test string. I don't think the diff noticed that.

How to make the output comprehensive or learn to understand it?

Answer

The main problem with your example is how you are handling endline characters. If you completely replace them in the input, the output will no longer line up correctly, and so won't make any sense. To fix that, the readable_whitespace function should look something like this:

def readable_whitespace(line):
    end = len(line.rstrip('\r\n'))
    return line[:end] + repr(line[end:])[1:-1] + '\n'

This will handle all types of endline sequence, and ensures that the lines are displayed correctly when printed.

The other minor problem is due to a typo:

text1 = [readable_whitespace(line) for line in text1]
text1 = [readable_whitespace(line) for line in text2]
# --^ oops!    

Once these fixes are made, the output will look like this:

- AAABAAA\n
?    ^
+ AAAAAAA\n
?    ^
+ \n
- BBB\n
?    --
+ BBB

which should hopefully now make sense to you.