ntrme ntrme - 18 days ago 10
Python Question

Comparing sublists and merging them

I have a list that contains a lot of sublists, which are initially pairs of numbers, so it looks like:

list = [[2, 3], [4, 5], [7, 8], [8, 9], [11, 12], [14, 15], [15, 16], [16, 17], [17, 18], [18, 19], [20, 21]]


and what I want is to compare last digit of sublist with first digit in next sublist and if they match - merge them in one sublist. So output for two matching sublists would be something like that:

output = [[7, 8, 9]]


And, of course, if there is a row of matching sublists then to merge them all in one big sublist.

output = [[14, 15, 16, 17, 18, 19]]


I was thinking about using itemgetter as a kind of a key to compare. So probably something like:

prev_digit = itemgetter(-1)
next_digit = itemgetter(0)


but then initially I realised that I don't really understand how can I use it in Python, due to lack of knowledge. I tried to think of a for loop, but it didn't work out as I didn't know how to implement those "keys".

For some kind of inspiration I used this Python, comparison sublists and making a list but even with that I still have no solution.

Also, as my list can get kinda big (from human perspective, so like thousand pairs or stuff) I'm very interested in the most efficient way to do this.

And yes, I'm new to Python, so I would be very grateful for good explanation. Of course I can google, so you can avoid explaining functions in depth, but like general logic would be nice.

Answer

I think I wrote this once. It can be done with a single pass over the list.

alist = [[2, 3], [4, 5], [7, 8], [8, 9], [11, 12], [14, 15], [15, 16], [16, 17], [17, 18], [18, 19], [20, 21]]

l = [alist[0][:]]
for e in alist[1:]:
   if l[-1][-1] == e[0]:
      l[-1].append(e[1])
   else:
      l.append(e[:])

The code reads as start with the first pair. Loop over the rest. Check if the last element of the last list is the same as the first element of the pair. If so append the second element else append the pair to the list.

This results in l being:

[[2, 3], [4, 5], [7, 8, 9], [11, 12], [14, 15, 16, 17, 18, 19], [20, 21]]

If you only want the largest sublist I suggest:

>>> l = [[2, 3], [4, 5], [7, 8, 9], [11, 12], [14, 15, 16, 17, 18, 19], [20, 21]]
>>> max(l, key=len)
[14, 15, 16, 17, 18, 19]

And evaluated:

>>> alist = [[2, 3], [4, 5], [7, 8], [8, 9], [11, 12], [14, 15], [15, 16], [16, 17], [17, 18], [18, 19], [20, 21]]
>>> 
>>> l = [alist[0][:]]
>>> for e in alist[1:]:
...    if l[-1][-1] == e[0]:
...       l[-1].append(e[1])
...    else:
...       l.append(e[:])
... 
>>> l
[[2, 3], [4, 5], [7, 8, 9], [11, 12], [14, 15, 16, 17, 18, 19], [20, 21]]
>>> alist
[[2, 3], [4, 5], [7, 8], [8, 9], [11, 12], [14, 15], [15, 16], [16, 17], [17, 18], [18, 19], [20, 21]]

And compared. The reduce solution takes 6.4 usecs:

$ python -mtimeit "list = [[2, 3], [4, 5], [7, 8], [8, 9], [11, 12], [14, 15], [15, 16], [16, 17], [17, 18], [18, 19], [20, 21]]" "reduce(lambda x,y: x[:-1] + [x[-1] + y[1:]] if x[-1][-1] == y[0] else x + [y], list[1:], [list[0]])"
100000 loops, best of 3: 6.4 usec per loop

The for loop takes 3.62 usecs:

$ python -mtimeit "alist = [[2, 3], [4, 5], [7, 8], [8, 9], [11, 12], [14, 15], [15, 16], [16, 17], [17, 18], [18, 19], [20, 21]]" "l = [alist[0][:]]" "for e in alist[1:]:" "   if l[-1][-1] == e[0]:" "      l[-1].append(e[1])" "   else:" "      l.append(e[:])"
100000 loops, best of 3: 3.62 usec per loop

On Python 2.7.3. The for loop is 56% faster. The difference would likely be more pronounced with larger inputs as the cost of a list concatenation depends on the sum of the length of the two lists. Whereas appending to a list is slightly cheaper.

Comments