I have a really weird problem. I've got three files, which contain one column of numbers. I need to get ONLY unique values from first file, that are not present at second and third files.
I tried Python like:
for e in firstfile:
if e not in secondfile:
And same for third file
If you are really doing the same for the third file, i.e. comparing the original contents of the first file with the third, you can introduce duplicates of items that were not in the second file but are in the third. For example:
file 1: 1 2 3 file 2: 1 file 3: 2
After processing file 2,
resultfile would contain 2 and 3. Then after processing file 3,
resultfile would contain 2 and 3 (from the first run) plus 1 and 3, i.e. 2, 3, 1, 3. However, the result should just be 3.
It's not clear from your code whether you are actually writing the output of each run to the file
resultfile. If you are, then you should use it as the input for the second and subsequent runs, don't process the first file again.
A better way to fix it
If you do not need to preserve the order of lines from the first file you could use
set.difference() like this:
with open('file1') as f1, open('file2') as f2, open('file3') as f3: unique_f1 = set(f1).difference(f2, f3)
Note that this will include any whitespace (including newline characters) present in the files. If you wanted to ignore leading and trailing whitespace from each line:
from itertools import chain with open('file1') as f1, open('file2') as f2, open('file3') as f3: unique_f1 = set(map(str.strip, f1)).difference(map(str.strip, chain(f2, f3)))
The above assumes Python 3. If you're using Python 2 then, optionally for better efficiency, import
itertools.imap and use it instead of
Or you might like to treat the data as numeric (I'll assume
float here, but you can use
from itertools import chain with open('file1') as f1, open('file2') as f2, open('file3') as f3: unique_f1 = set(map(float, f1)).difference(map(float, chain(f2, f3)))