K.K. K.K. - 2 months ago 14
Python Question

Best algorithm to compare list or dict

I'm dealing with a bit complicated data set via Python. I'm a novice Python coder. The data set is the collection of Date, Title, Contents and URL.

Conceptually, it will be like this.

1st scraping runs, then I get,

[9/6 9:00, title1, content1]
[9/6 9:00, title2, content2]
[9/6 8:22, title3, content3]
[9/6 11:01, title4, content4]
...

2nd scraping runs, then I get,

[9/6 13:05, title5, content5]
[9/6 12:13, title6, content6]
[9/6 9:00, title1, content1]
[9/6 14:21, title4, content4'] ---> This is updated of content4
...


I could run scraping code.
What I want to do is to compare the output of 1st scraping run and 2nd.
I expect to show only diff.

[9/6 13:05, title5, content5]
[9/6 12:13, title6, content6]
[9/6 10:21, title4', content4']


I don't believe I have to compare "content".
I can get the diff by "date" and "title" only.

I spent hours but cannot think of elegant approach to make this work..
What would be the best approach here? Basically, I'm thinking to store the output as pickle then compare the 2nd scrape run output on the fly. However, I'm not sure how to compare to get two elements of list simultaneously then compare with two elements from second list. It does not seem it is simple for loop...

Or, can this be done by dict? I don't think so... but welcome to any suggestion.

It will be much appreciated if experienced folks could comment.

Answer

Try this for comparison between list in python 3:

a= [['9/6 9:00', 'title1', 'content1'],
['9/6 9:00', 'title2', 'content2'],
['9/6 8:22', 'title3', 'content3'],
['9/6 11:01','title4', 'content4']]
b=[['9/6 13:05', 'title5', 'content5'],
['9/6 12:13', 'title6', 'content6'],
['9/6 9:00', 'title1', 'content1'],
['9/6 14:21', 'title4', 'content4']]
for i in b:
    if i not in a:
        print(i)

Output:

['9/6 13:05', 'title5', 'content5']
['9/6 12:13', 'title6', 'content6']
['9/6 14:21', 'title4', 'content4']

Here it is directly comparing whole list to another list like ['9/6 11:01','title4', 'content4'] to ['9/6 14:21', 'title4', 'content4'] so if any single element is different in list it shows that list but if you want to compare different element of list to another element in another list then you have to apply another method.

Another Solution (Which does the same but using list comprehension) :

print(*[i for i in b if i not in a],sep='\n')

It will also gives same output:

['9/6 13:05', 'title5', 'content5']
['9/6 12:13', 'title6', 'content6']
['9/6 14:21', 'title4', 'content4']

For understanding list comprehension see this document : Python List Comprehensions: Explained Visually

If you tell what difference we have to print then I can help because in question i don't understand the how we get 9/6 10:21 this output in line [9/6 10:21, title4', content4']

Comments