Labrat Labrat - 5 months ago 10
Python Question

Splitting csv file of multiple objects over time by time-point

Here I have an example file of multiple objects each measured at the same time-points (also ND.T represents each unique time point). I would like to split this file into separate files (using a python script) containing all objects unique to each time-point still containing the header.

Original file:

ID ND.T Time [s] Position X [%s] Position Y [%s] Speed [%s] Area [%s] Width [%s] MeanIntensity
1 1 3.87 417.57 11.46 0.06 339.48 14.1 245.65
1 2 8.72 417.37 11.68 0.04 342.61 14.15 239.34
1 3 13.39 417.57 11.66 0.04 344.17 14.3 239.48
2 1 3.87 439.01 6.59 0.02 342.61 11.66 204.47
2 2 8.72 438.97 6.65 0.007 342.61 10.7 197.96
2 3 13.39 438.94 6.66 0.03 345.74 11.03 214.74


Time_3.87.csv

ID ND.T Time [s] Position X [%s] Position Y [%s] Speed [%s] Area [%s] Width [%s] MeanIntensity
1 1 3.87 417.57 11.46 0.06 339.48 14.1 245.65
2 1 3.87 439.01 6.59 0.02 342.61 11.66 204.47


Time_8.72.csv

ID ND.T Time [s] Position X [%s] Position Y [%s] Speed [%s] Area [%s] Width [%s] MeanIntensity
1 2 8.72 417.37 11.68 0.04 342.61 14.15 239.34
2 2 8.72 438.97 6.65 0.007 342.61 10.7 197.96


Time_13.39.csv

ID ND.T Time [s] Position X [%s] Position Y [%s] Speed [%s] Area [%s] Width [%s] MeanIntensity
1 3 13.39 417.57 11.66 0.04 344.17 14.3 239.48
2 3 13.39 438.94 6.66 0.03 345.74 11.03 214.74


Example 2:

ID ND.T Time [s] Position X [%s] Position Y [%s] Speed [%s] Area [%s] Width [%s] MeanIntensity
1 1 3.87 417.57 11.46 0.06 339.48 14.1 245.65
1 2 8.72 417.37 11.68 0.04 342.61 14.15 239.34
1 3 13.39 417.57 11.66 0.04 344.17 14.3 239.48
1 4 18.1 417.73 11.71 0.04 337.92 14.14 225.17
1 5 22.81 417.83 11.89 0.03 344.17 14.64 233.3
1 6 27.48 417.69 11.83 0.02 345.74 14.23 238
1 7 32.16 417.65 11.94 0.03 345.74 14.71 230.75
2 1 3.87 439.01 6.59 0.02 342.61 11.66 204.47
2 2 8.72 438.97 6.65 0.007 342.61 10.7 197.96
2 3 13.39 438.94 6.66 0.03 345.74 11.03 214.74
2 4 18.1 438.9 6.53 0.04 342.61 10.46 202.9
2 5 22.81 438.97 6.7 0.02 342.61 10.3 194.32
2 6 27.48 438.89 6.71 0.006 350.43 11 219.41
2 7 32.16 438.87 6.74 0.05 348.87 10.36 219.58

Answer

You can use pandas to achieve this:

import pandas as pd
df = pd.read_csv(your_file)
df.groupby('Time [s]').apply(lambda x: x.to_csv(str(x.name) + '.csv'))

The above will load your csv using read_csv and then group on the Time [s] column and use this to name the file

You can see that the df is grouped on the Time [s]:

In [108]:
df.groupby('Time [s]').apply(lambda x: print(x))
   ID  ND.T  Time [s]  Position X [%s]  Position Y [%s]  Speed [%s]  \
0   1     1      3.87           417.57            11.46        0.06   
3   2     1      3.87           439.01             6.59        0.02   

   Area [%s]  Width [%s]  MeanIntensity  
0     339.48       14.10         245.65  
3     342.61       11.66         204.47  
   ID  ND.T  Time [s]  Position X [%s]  Position Y [%s]  Speed [%s]  \
0   1     1      3.87           417.57            11.46        0.06   
3   2     1      3.87           439.01             6.59        0.02   

   Area [%s]  Width [%s]  MeanIntensity  
0     339.48       14.10         245.65  
3     342.61       11.66         204.47  
   ID  ND.T  Time [s]  Position X [%s]  Position Y [%s]  Speed [%s]  \
1   1     2      8.72           417.37            11.68       0.040   
4   2     2      8.72           438.97             6.65       0.007   

   Area [%s]  Width [%s]  MeanIntensity  
1     342.61       14.15         239.34  
4     342.61       10.70         197.96  
   ID  ND.T  Time [s]  Position X [%s]  Position Y [%s]  Speed [%s]  \
2   1     3     13.39           417.57            11.66        0.04   
5   2     3     13.39           438.94             6.66        0.03   

   Area [%s]  Width [%s]  MeanIntensity  
2     344.17       14.30         239.48  
5     345.74       11.03         214.74  

Out[108]:
Empty DataFrame
Columns: []
Index: []

Here groupby will group on 'Time [s]' column, we then call apply to apply a lambda where we call the method to_csv on each grouping, we can access the group name using name attribute which is of dtype int so we cast to str and construct our csv name:

In [109]:
df.groupby('Time [s]').apply(lambda x: print(str(x.name) + '.csv'))

3.87.csv
8.72.csv
13.39.csv
Out[109]:
Empty DataFrame
Columns: []
Index: []
Comments