Enrique Enrique - 1 year ago 70
Bash Question

How can I split a column in a specific range?

I'm working with proteins trajectory and I've got a long data frame. (File with one column and 600.000 lines.

This is and example:

100
100
0
100
100
...
n=600.000


What I wish is to split this data every 3000 lines, creating a new column beside like this example:

Col1 Col2 Col3 Col4 Col...200:
n=1 n=3001 n=6001 n=9001 ...
0 0 0 0 ...
0 0 0 0 ...
100 100 100 100 ...
... ... ... ... ...
n=3000 n=6000 n=9000 n=12000 n=600.000


n= line number.

Is there any way to do this in R or bash?

Thank you very much in advance.

EDIT: I'm using this script in python to generate that column:

from decimal import *
i = 1
while(i <= 15):
output = open('cache/distances_'+str(i)+'.dat.results', 'w')
with open('cache/distances_medias_'+str(i)+'.dat', 'r') as f:
for line in f:
columns = line.split(' ')
if(Decimal(columns[0]) <= 2.5 and (Decimal(columnas[1]) > 120 and Decimal(columnas[1]) < 180)):
salida.write("100\n")
else:
salida.write("0\n")
salida.close()
i+=2


Is there any way to modify the script and when it reaches the line 3000, start in a new column?

Answer Source

I am not sure I understand your example, but you should be able to use a combination of split and paste:

$ cat filetosplit
1
2
3
4
5
6
7
8
9
10

$ split filetosplit  "split." -l 3 -d ; paste split*
1       4       7       10
2       5       8
3       6       9

The split command will generate files for 3 lines per row (you can modify to 3000). The paste will put all them together. You can use sed to add an header with column names and initial number.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download