Jungle Hunter - 6 months ago 84

Python Question

Given an input sequence, what is the best way to find the longest (not necessarily continuous) non-decreasing subsequence.

`0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15 # sequence`

1, 9, 13, 15 # non-decreasing subsequence

0, 2, 6, 9, 13, 15 # longest non-deceasing subsequence (not unique)

I'm looking for the best algorithm. If there is code, Python would be nice, but anything is alright.

Answer

I just stumbled in this problem, and came up with this Python 3 implementation:

```
def subsequence(seq):
if not seq:
return seq
M = [None] * len(seq) # offset by 1 (j -> j-1)
P = [None] * len(seq)
# Since we have at least one element in our list, we can start by
# knowing that the there's at least an increasing subsequence of length one:
# the first element.
L = 1
M[0] = 0
# Looping over the sequence starting from the second element
for i in range(1, len(seq)):
# Binary search: we want the largest j <= L
# such that seq[M[j]] < seq[i] (default j = 0),
# hence we want the lower bound at the end of the search process.
lower = 0
upper = L
# Since the binary search will not look at the upper bound value,
# we'll have to check that manually
if seq[M[upper-1]] < seq[i]:
j = upper
else:
# actual binary search loop
while upper - lower > 1:
mid = (upper + lower) // 2
if seq[M[mid-1]] < seq[i]:
lower = mid
else:
upper = mid
j = lower # this will also set the default value to 0
P[i] = M[j-1]
if j == L or seq[i] < seq[M[j]]:
M[j] = i
L = max(L, j+1)
# Building the result: [seq[M[L-1]], seq[P[M[L-1]]], seq[P[P[M[L-1]]]], ...]
result = []
pos = M[L-1]
for _ in range(L):
result.append(seq[pos])
pos = P[pos]
return result[::-1] # reversing
```

Since it took me some time to understand how the algorithm works I was a little verbose with comments, and I'll also add a quick explanation:

`seq`

is the input sequence.`L`

is a number: it gets updated while looping over the sequence and it marks the length of longest incresing subsequence found up to that moment.`M`

is a list.`M[j-1]`

will point to an index of`seq`

that holds the smallest value that could be used (at the end) to build an increasing subsequence of length`j`

.`P`

is a list.`P[i]`

will point to`M[j]`

, where`i`

is the index of`seq`

. In a few words, it tells which is the previous element of the subsequence.`P`

is used to build the result at the end.

How the algorithm works:

- Handle the special case of an empty sequence.
- Start with a subsequence of 1 element.
- Loop over the input sequence with index
`i`

. - With a binary search find the
`j`

that let`seq[M[j]`

be`<`

than`seq[i]`

. - Update
`P`

,`M`

and`L`

. - Traceback the result and return it reversed.

**Note:** The only differences with the wikipedia algorithm are the offset of 1 in the `M`

list, and that `X`

is here called `seq`

. I also test it with a slightly improved unit test version of the one showed in Eric Gustavson answer and it passed all tests.

Example:

```
seq = [30, 10, 20, 50, 40, 80, 60]
0 1 2 3 4 5 6 <-- indexes
```

At the end we'll have:

```
M = [1, 2, 4, 6, None, None, None]
P = [None, None, 1, 2, 2, 4, 4]
result = [10, 20, 40, 60]
```

As you'll see `P`

is pretty straightforward. We have to look at it from the end, so it tells that before `60`

there's `40,`

before `80`

there's `40`

, before `40`

there's `20`

, before `50`

there's `20`

and before `20`

there's `10`

, stop.

The complicated part is on `M`

. At the beginning `M`

was `[0, None, None, ...]`

since the last element of the subsequence of length 1 (hence position 0 in `M`

) was at the index 0: `30`

.

At this point we'll start looping on `seq`

and look at `10`

, since `10`

is `<`

than `30`

, `M`

will be updated:

```
if j == L or seq[i] < seq[M[j]]:
M[j] = i
```

So now `M`

looks like: `[1, None, None, ...]`

. This is a good thing, because `10`

have more chanches to create a longer increasing subsequence. (The new 1 is the index of 10)

Now it's the turn of `20`

. With `10`

and `20`

we have subsequence of length 2 (index 1 in `M`

), so `M`

will be: `[1, 2, None, ...]`

. (The new 2 is the index of 20)

Now it's the turn of `50`

. `50`

will not be part of any subsequence so nothing changes.

Now it's the turn of `40`

. With `10`

, `20`

and `40`

we have a sub of length 3 (index 2 in `M`

, so `M`

will be: `[1, 2, 4, None, ...]`

. (The new 4 is the index of 40)

And so on...

For a complete walk through the code you can copy and paste it here :)