Nikson - 11 months ago 56

Perl Question

I have txt file with every line structure like this:

`P[containerVrsn:U(0)recordVrsn:U(0)size:U(212)ownGid:G[mdp:U(1090171666)**seqNo:U(81920)**]logicalDbNo:U(1)classVrsn:U(1)timeStamp:U(0)dbRecord:T[classNo:U(1064620)size:U(184)updateVersion:U(3)checksum:U(748981000)`

And have to sort file lines based on seqNo (min to max). Sequence number can be virtually any number starting from zero. Any idea how can it be done in efficient way?

Answer Source

The *Schwartzian Transform* as suggested in Toto's answer is probably the fastest way to sort your lines here. But you said you're a Perl newbie, and I like to show how the lines can be sorted *traditionally*.

Perl has a `sort`

function that sorts a list simply by alphabet. But you can supply a custom comparison function and let `sort`

use *your* function to compare the elements. During its operation `sort`

must continuously compare two elements (=lines) of your list and decide which one is greater or lesser or whether they are equal.

If you supply a comparison function, `sort`

will call it with two such elements as the parameters `$a`

and `$b`

. You ~~do not need to~~ must not declare `$a`

and `$b`

, they are magic and just there. Your comparison function could look like this:

```
sub by_seqNo
{
# extract the sequence number from $a and $b
my ($seqA) = ($a =~ /seqNo:U\((\d+)/);
my ($seqB) = ($b =~ /seqNo:U\((\d+)/);
# numerically compare the sequence numbers (returns -1/0/+1)
$seqA <=> $seqB;
}
```

The first two lines extract the numbers after `seqNo:U(`

and store them as `$seqA`

and `$seqB`

. The third line compares these sequence numbers as integers and returns that result. Combined with the `sort`

function this gives:

```
my @sorted = sort by_seqNo @lines;
```

The reason why the *Schwartzian Transform (ST)* is faster than this solution is because the ST does the (expensive) operation of extracting the seqNo from your lines exactly once for each line. The "traditional" approach on the other hand extracts the seqNo twice for each comparison.