A. ALT - 1 year ago 55
R Question

# How can I extract numeric ranges from 2 columns and print the range from both columns as tuples?

I'm quite new on bash scripting and on python programing; at the moment have 2 columns which contains numeric sequences as follow:

``````Col 1:
1
2
3
5
7
8

Col 2:

101
102
103
105
107
108
``````

Need to extract the numeric ranges from both columns and print them according to the sequence break occrance on any of those 2 columns and the result should be as follow:

``````1,3,101,103

5,5,105,105

7,8,107,108
``````

Already received a useful information on how to extract numeric ranges from one column using awk: - \$ awk 'NR==1||sqrt((\$0-p)*(\$0-p))>1{print p; printf "%s", \$0 ", "} {p=\$0} END{print \$0}' file - ; but now the problem got a bit more complex as have to include a second column with another numeric sequence and requires as a result the ranges from the columns wherever the sequence breaks occurs on any of the 2 columns.

To add a bit more complexity the sequences can be ascending and/or descending.

Trying to find a solution using pandas (data frames) and numpy libraries for python.

Hello MaxU thanks for your reply, unfortunately I'm hitting an issue for the following case:

Col 1:

`````` 7
8
9
10
11

Col 2:

52
51
47
46
45
``````

Where numeric sequence in the second column is descending from the begining; it generates as a result:

7,11,45,52

7,8,51,52

8,11,45,47

Cheers.

UPDATE:

``````In [103]: df
Out[103]:
Col1  Col2
0     7    52
1     8    51
2     9    47
3    10    46
4    11    45

In [104]: (df.groupby((df.diff().abs() != 1).any(1).cumsum()).agg(['min','max']))
Out[104]:
Col1     Col2
min max  min max
1    7   8   51  52
2    9  11   45  47
``````

Here is one way (among many) to do it in Pandas:

Data:

``````In [314]: df
Out[314]:
Col1  Col2
0     1   101
1     2   102
2     3   103
3     5   105
4     8   108
5     7   107
6     6   106
7     9   109
``````

NOTE: pay attention - rows with indexes (4,5,6) is a descending sequence

Solution:

``````In [350]: rslt = (df.groupby((df.diff().abs() != 1).all(1).cumsum())
...:           .agg(['min','max']))
...:

In [351]: rslt
Out[351]:
Col1     Col2
min max  min  max
1    1   3  101  103
2    5   5  105  105
3    6   8  106  108
4    9   9  109  109
``````

now you can easily save it to CSV file:

``````rslt.to_csv(r'/path/to/file_name.csv', index=False, header=None)
``````

or just print it:

``````In [333]: print(rslt.to_csv(index=False, header=None))
1,3,101,103
5,5,105,105
6,8,106,108
9,9,109,109
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download