mirix mirix - 15 days ago 9
Python Question

Python: Split CSV file according to first character of the first column

I have a series of large CSV files "basename.csv" like:

B1,3,5,6

B2,2,1,5

B3,1,9,0

C1,4,7,9

C2,1,9,3

C3,8,5,2

I would like to split them into different files like:

basename_B.csv

B1,3,5,6

B2,2,1,5

B3,1,9,0

basename_C.csv

C1,4,7,9

C2,1,9,3

C3,8,5,2

I have already done similar things in the past with for loops and ifs, but I was wondering if there is a more efficient way of doing this with Pandas or whatever.

SOLUTION

Adapting the solution from @chthonicdaemon and @jezrael, I came up with this:

def split_csv():
for dfile in glob.glob('*.csv'):
df = pd.read_csv(dfile, header=None)
for letter, group in df.groupby(df[0].str[0]):
group.to_csv((os.path.splitext(dfile)[0]) + '_{}.csv'.format(letter), index=False, header=False)

split_csv()

Answer

Here's a simple application of groupby:

df = pandas.read_csv('basename.csv', header=None)

def firstletter(index):
    firstentry = df.ix[index, 0]
    return firstentry[0]

for letter, group in df.groupby(firstletter):
    group.to_csv('basename_{}.csv'.format(letter))

Or, incorporating @jezrael's use of grouping by the explicit contents of the columns:

for letter, group in df.groupby(df[0].str[0]):
    group.to_csv('basename_{}.csv'.format(letter))