There is a file with three columns. Columns 1 and 2 contain numbers are not from the same set. In fact, some numbers that exist in column 2, may not exist in column1.
Column 3 shows the amount of connectedness between numbers in columns 1 and 2.
I want to partition my numbers in column 1 into groups of consecutive values (i.e., ranges) for which connectedness is greater than or equal to 0.2. For example, in this small data set:
input:
1 2 0.000
1 3 0.213
1 4 0.014
1 5 0.001
1 6 0.555
1 7 0.509
1 8 0.509
3 4 0.995
3 5 0.323
3 6 0.555
3 7 0.225
3 8 0.000
4 5 0.095
4 6 0.058
4 7 0.335
4 8 0.000
5 6 0.995
5 7 0.658
5 8 0.000
6 7 0.431
6 8 0.000
7 8 0.000
G1: 1 3 G2: 4 G3 :5 6 7
49996 49997 0.000
49996 49998 0.082
49996 49999 0.953
49996 50000 0.060
49996 50001 0.000
49998 49999 0.288
49998 50000 0.288
49998 50001 0.000
49999 50000 0.265
49999 50001 0.000
50000 50001 0.000
G1:49996 G3: 49998 49999 50000
I had to start from scratch instead of editing my reply to your previous question, as I needed to prepreocess the input file first to get a list of numbers to consider (i.e. the numbers from the first column).
#!/usr/bin/perl
use warnings;
use strict;
my $THRESHOLD = 0.2;
my @considered;
open my $IN, '<', shift or die $!;
while (<$IN>) {
my ($first) = split ' ', $_, 2;
push @considered, $first unless @considered && $first == $considered[-1];
}
seek $IN, 0, 0;
my $considered_idx = 0;
my @groups = ([ $considered[$considered_idx] ]);
while (<$IN>) {
my ($n1, $n2, $connectedness) = split;
next if $n1 == $considered[$considered_idx]
&& $n2 < $considered[ 1 + $considered_idx ];
next if $n2 > $considered[-1];
if ($n1 == $considered[$considered_idx]) {
if ($connectedness > $THRESHOLD) {
push @{ $groups[-1] }, $n2;
} else {
++$considered_idx until $considered_idx > $#considered
|| $considered[$considered_idx] >= $n2;
push @groups, [ $considered[$considered_idx] ];
}
}
}
for my $i (0 .. $#groups) {
print "$i\t@{ $groups[$i] }\n";
}