Enrique Enrique - 5 months ago 17
Bash Question

How can I split a column in a specific range?

I'm working with proteins trajectory and I've got a long data frame. (File with one column and 600.000 lines.

This is and example:

100
100
0
100
100
...
n=600.000


What I wish is to split this data every 3000 lines, creating a new column beside like this example:

Col1 Col2 Col3 Col4 Col...200:
n=1 n=3001 n=6001 n=9001 ...
0 0 0 0 ...
0 0 0 0 ...
100 100 100 100 ...
... ... ... ... ...
n=3000 n=6000 n=9000 n=12000 n=600.000


n= line number.

Is there any way to do this in R or bash?

Thank you very much in advance.

EDIT: I'm using this script in python to generate that column:

from decimal import *
i = 1
while(i <= 15):
output = open('cache/distances_'+str(i)+'.dat.results', 'w')
with open('cache/distances_medias_'+str(i)+'.dat', 'r') as f:
for line in f:
columns = line.split(' ')
if(Decimal(columns[0]) <= 2.5 and (Decimal(columnas[1]) > 120 and Decimal(columnas[1]) < 180)):
salida.write("100\n")
else:
salida.write("0\n")
salida.close()
i+=2


Is there any way to modify the script and when it reaches the line 3000, start in a new column?

Answer

I am not sure I understand your example, but you should be able to use a combination of split and paste:

$ cat filetosplit
1
2
3
4
5
6
7
8
9
10

$ split filetosplit  "split." -l 3 -d ; paste split*
1       4       7       10
2       5       8
3       6       9

The split command will generate files for 3 lines per row (you can modify to 3000). The paste will put all them together. You can use sed to add an header with column names and initial number.

Comments