ömer sarı ömer sarı - 2 months ago 7
Python Question

Pandas groupby method

l have a 41 year dataset and l would like to do some statistical calculations by using a Pandas module. However, l have a lack of Pandas knowledge.
here is an example csv file dataset:

date day month year pcp1 pcp2 pcp3 pcp4 pcp5 pcp6
1.01.1979 1 1 1979 0.431 2.167 9.375 0.431 2.167 9.375
2.01.1979 2 1 1979 1.216 2.583 9.162 1.216 2.583 9.162
3.01.1979 3 1 1979 4.041 9.373 23.169 4.041 9.373 23.169
4.01.1979 4 1 1979 1.799 3.866 8.286 1.799 3.866 8.286
5.01.1979 5 1 1979 0.003 0.051 0.342 0.003 0.051 0.342
6.01.1979 6 1 1979 2.345 3.777 7.483 2.345 3.777 7.483
7.01.1979 7 1 1979 0.017 0.031 0.173 0.017 0.031 0.173
8.01.1979 8 1 1979 5.061 5.189 43.313 5.061 5.189 43.313


here is my code:

import numpy as np
import pandas as pd
import csv

filename="output813b.csv"
cols = ["date","year","month","day" ,"pcp1","pcp2","pcp3","pcp4","pcp5","pcp6"]
data1=pd.read_csv(filename,sep=',', header=None,names=cols,usecols=range(1,9))
colmns_needed=["month" ,"pcp1","pcp2","pcp3","pcp4","pcp5","pcp6"]
data2=pd.read_csv(filename,sep=',', header=None,names=colmns_needed)
mm=data2.groupby("month")
print(mm.sum())
print('\n')


but values under columns of PCP seems stored as string.
here is example output for
pcp1
:

Month pcp1

1 0.4310.4720000.91800000.01011.63904.65900.5780...
10 00.1500000000.027000.02400.1630.9610000000.017...
11 00.4940000000000.0480.003012.26200000003.612.9...
12 0.1890.0760.47000000000.08800.1080.26107.15000...
13 00.06500.1060.00700000050.6207.1510.0860.1487....
14 0000.64200000000.017025.5910.93400.04500000000...
15 0.742000.0720000000000.32500000000002.9877.512...
16 6.43900000000000.38103.986000000000033.5534.76...
17 0.0890000.2750000.555001.9230.562.9130.1360000...
18 3.28200000000.024000.656002.1750000000008.2434...
19 1.28200000000000000.0070000000007.0383.0450.17...
2 1.2160.1050000000010.4690.2092.9700.0415.6062....
20 00.4960.05100000000000.3550.1582.8530.04600000...
21 00000000000002.69903.5190.13000002.830.5151.09...
22 0000000007.19600000000000001.4421.76500.04500....
23 0000000008.168000.02100000000000.1083.8760.968...


how can l solve that problem?

Answer

Do not specify header=None in your read_csv calls. You are telling the function that there is no header row in the data, when according to the sample data you posted above, the first row of the file is a header. So it treats that first header row as data, thus mixing values like pcp1 and 0.431, and causing all the columns to be interpreted as strings.