Deex Deex - 5 months ago 7
Linux Question

Filter new lines between two files to new file

I'm trying to compare two text files and save the results in a new file. It should display only lines that are new and ignore sort orders. I just want to see what is new and no line changes. To reach this I tried several ways inside a batch file you can see below.
First I use uniq and sort to change the order of both files like:

D:/filetype/sort.exe -b D:\filetype\listfile\listfile_clean_tmp3_1.txt -oD:\filetype\listfile\listfile_clean_tmp4.txt


After that I tried to compare both files to a new one.

1) via comm

D:/filetype/comm.exe --nocheck-order -2 -3 d:/filetype/listfile/listfile_clean_tmp4.txt d:/filetype/listfile/archive/tmp/all.txt > D:\filetype\listfile\listfile_clean_tmp5.txt


a terrible solution, doesn't work correctly. If there is something different in the order it gives out a lot of false results. As an example if I save the result to the archive and compare it again with itself it is displaying new lines.

2) Via diff

D:/filetype/diff.exe --new-line-format="" --unchanged-line-format="" d:/filetype/listfile/listfile_clean_tmp4.txt d:/filetype/listfile/archive/tmp/all.txt > D:\filetype\listfile\listfile_clean_tmp5.txt


Same as .comm, if I compare the archived file with itself it will display new lines.

3) Grep, SED and Diff

D:/filetype/diff.exe -U $(wc -l < (D:/filetype/listfile/listfile_clean_tmp4.txt) (D:/filetype/listfile/listfile_clean_tmp4.txt D:/filetype/listfile/archive/tmp/all.txt | D:/filetype/grep.exe '^-' | D:/filetype/sed.exe 's/^-//g' > D:\filetype\listfile\listfile_clean_tmp5.txt


Wasn't able to get this running on a Windows Batch, and can't figure out why the error message "Wrong syntax for file or folder name" appears. Any ideas would be great.

Some futher information
- I'm using coreutils inside Windows
- Could use a .py script, batch, php and corutils
- I add the result to the archive

Here is an example file:
http://pastebin.com/raw/tNGSu2W6

First: I compare it with an empty txt file (all.txt), in the last step I merge it into all.txt (as archive) and normaly with the next run he shouldn't find any difference.

copy /b D:\filetype\listfile\archive\*.txt D:\filetype\listfile\listfile_tmp_all.txt
D:/filetype/uniq.exe D:\filetype\listfile\listfile_tmp_all.txt > D:/filetype/listfile/archive/tmp/tmp_all2.txt
D:/filetype/sort.exe -b D:/filetype/listfile/archive/tmp/tmp_all2.txt -oD:/filetype/listfile/archive/tmp/all.txt


To not confuse anyone this is the whole thing (it's a bit spartan).
http://pastebin.com/T68sqpET

Update 1
Changed the pastbin code, there was a typo on the part with diff.exe "
< filename < filename > output
" will trow error messages on windows. Also the numbers were not the fault of diff. I used "uniq.exe -c" and needed to remove the -c.

Answer

Okay i figured out why comm didn't work correctly. The Command

D:/filetype/comm.exe --nocheck-order -2 -3 d:/filetype/listfile/listfile_clean_tmp4.txt d:/filetype/listfile/archive/tmp/all.txt > D:\filetype\listfile\listfile_clean_tmp5.txt

Doesent need --no-check-order, but that is not all. The main reason why it was failing in comparing was, that i used uniq or merger in mein Tool Chain and do not sort it always strict again after parts of the textfile have been edited. It is highly important that files that will be compared with comm are always sorted in the right order. Here is an example with linux sort from coreutils

D:/filetype/core/sort.exe -b D:\filetype\listfile\archive\tmp\bruteforce.txt -o D:\filetype\listfile\archive\tmp\bruteforce2.txt
D:/filetype/core/uniq.exe  D:\filetype\listfile\archive\tmp\bruteforce2.txt > D:\filetype\listfile\archive\tmp\bruteforce3.txt
D:/filetype/core/sort.exe -b D:\filetype\listfile\archive\tmp\bruteforce3.txt -o D:\filetype\listfile\archive\tmp\bruteforce4.txt
Comments