Tomáš Zato Tomáš Zato - 3 months ago 21
Python Question

How to understand/use the Python difflib output?

I am trying to make comprehensive diff that compares command line output of two programs. I used

and came up with this code:

from difflib import Differ
from pprint import pprint
import sys

def readable_whitespace(line):
return line.replace("\n", "\\n")

# Two strings are expected as input
def print_diff(text1, text2):
d = Differ()
text1 = text1.splitlines(True)
text2 = text2.splitlines(True)

text1 = [readable_whitespace(line) for line in text1]
text1 = [readable_whitespace(line) for line in text2]

result = list(, text2))

Some requirements I have:

  • (obvious) It should be clear what is from which output when there is a difference

  • New lines are replaced with
    because they matter in my case and must be clearly visible when causing conflict

I made a simple test for my diff function:

A += "BBB\n"
B += "\n"
B += "BBB"

For your convenience, here is test merged with the function so that you can execute it as file:

I have no idea what is this output trying to say to me:

- AAAAAAA\n? ^^
? ^
- \n+

Notice those two
symbols on first line? What are they pointing to...? Also, I intentionally put trailing new line into one test string. I don't think the diff noticed that.

How to make the output comprehensive or learn to understand it?


The main problem with your example is how you are handling endline characters. If you completely replace them in the input, the output will no longer line up correctly, and so won't make any sense. To fix that, the readable_whitespace function should look something like this:

def readable_whitespace(line):
    end = len(line.rstrip('\r\n'))
    return line[:end] + repr(line[end:])[1:-1] + '\n'

This will handle all types of endline sequence, and ensures that the lines are displayed correctly when printed.

The other minor problem is due to a typo:

text1 = [readable_whitespace(line) for line in text1]
text1 = [readable_whitespace(line) for line in text2]
# --^ oops!    

Once these fixes are made, the output will look like this:

?    ^
?    ^
+ \n
- BBB\n
?    --

which should hopefully now make sense to you.