Trees4theForest Trees4theForest - 6 months ago 13
Python Question

Python loop through list and return "out of sequence" values

Consider this list:

dates = [
('2015-02-03', 'name1'),
('2015-02-04', 'nameg'),
('2015-02-04', 'name5'),
('2015-02-05', 'nameh'),
('1929-03-12', 'name4'),
('2023-07-01', 'name7'),
('2015-02-07', 'name0'),
('2015-02-08', 'nameh'),
('2015-02-15', 'namex'),
('2015-02-09', 'namew'),
('1980-12-23', 'name2'),
('2015-02-12', 'namen'),
('2015-02-13', 'named'),

How can I identify those dates that are out of sequence. I don't care if they repeat, or skip, I just need the ones way out of line. Ie, I should get back:

('1929-03-12', 'name4'),
('2023-07-01', 'name7'),
('2015-02-15', 'namex'),
('1980-12-23', 'name2'),

Namex is less obvious, but it's not in the general order of the list.

My simplistic start (which I have deleted to simplify the question) is obviously woefully incomplete.

Update: Based on the comments, it seems an implementation of the Longest Increase Subsequence (LIS) will get me started, a python implementation found here:

Seems once I get the LIS, I can compare it to the original list and see where the gaps are... Fascinating. SO is the hive-mind of awesomeness.


Based on the question at Code Review and a question about non-decreasing sequences (since that's what you're after), here's a solution to your problem:

from bisect import bisect_right
from operator import itemgetter

def out_of_sequence(seq, key = None):
  if key is None: key = lambda x: x 

  lastoflength = [0] # end position of subsequence with given length
  predecessor = [None] # penultimate element of l.i.s. ending at given position

  for i in range(1, len(seq)):
    # find length j of subsequence that seq[i] can extend
    j = bisect_right([key(seq[k]) for k in lastoflength], key(seq[i]))
    # update old subsequence or extend the longest
    try: lastoflength[j] = i
    except: lastoflength.append(i)
    # record element preceding seq[i] in the subsequence for backtracking
    predecessor.append(lastoflength[j-1] if j > 0 else None)

  indices = set()
  i = lastoflength[-1]
  while i is not None:
    i = predecessor[i]

  return [e for i, e in enumerate(seq) if i not in indices]

print(*out_of_sequence(dates, itemgetter(0)), sep='\n')


('1929-03-12', 'name4')
('2023-07-01', 'name7')
('2015-02-15', 'namex')
('1980-12-23', 'name2')