squar_o squar_o - 3 months ago 9
Python Question

Pandas select columns and data dependant on header

I have a large .csv file. I want to select only the column with he time/date and 20 other columns which I know by header.

As a test I try to take only the column with the header 'TIMESTAMP' I know this is
4207823 rows long in the .csv and it only contains dates and times. The code below selects the TIMESTAMP column but also carries on to take values from other columns as shown below:

import csv
import numpy as np
import pandas

low_memory=False
f = pandas.read_csv('C:\Users\mmso2\Google Drive\MABL Wind\_Semester 2 2016\Wind Farm Info\DataB\DataB - NaN2.csv', dtype = object)#convert file to variable so it can be edited

time = f[['TIMESTAMP']]
time = time[0:4207823]#test to see if this stops time taking other data
print time


output

TIMESTAMP
0 2007-08-15 21:10:00
1 2007-08-15 21:20:00
2 2007-08-15 21:30:00
3 2007-08-15 21:40:00
4 2007-08-15 21:50:00
5 2007-08-15 22:00:00
6 2007-08-15 22:10:00
7 2007-08-15 22:20:00
8 2007-08-15 22:30:00
9 2007-08-15 22:40:00
10 2007-08-15 22:50:00
11 2007-08-15 23:00:00
12 2007-08-15 23:10:00
13 2007-08-15 23:20:00
14 2007-08-15 23:30:00
15 2007-08-15 23:40:00
16 2007-08-15 23:50:00
17 2007-08-16 00:00:00
18 2007-08-16 00:10:00
19 2007-08-16 00:20:00
20 2007-08-16 00:30:00
21 2007-08-16 00:40:00
22 2007-08-16 00:50:00
23 2007-08-16 01:00:00
24 2007-08-16 01:10:00
25 2007-08-16 01:20:00
26 2007-08-16 01:30:00
27 2007-08-16 01:40:00
28 2007-08-16 01:50:00
29 2007-08-16 02:00:00 #these are from the TIMESTAMP column
... ...
679302 221.484 #This is from another column
679303 NaN
679304 2015-09-23 06:40:00
679305 NaN
679306 NaN
679307 2015-09-23 06:50:00
679308 NaN
679309 NaN
679310 2015-09-23 07:00:00

Answer

The problem was due to an error in the input file so simple use of usecols in pandas.read_csv worked.

code below demonstrates the selection of a few columns of data

import csv
import pandas

low_memory=False



    #read only the selected columns
    df = pandas.read_csv('DataB - Copy - Copy.csv',delimiter=',', dtype = object,
    usecols=['TIMESTAMP', 'igmmx_U_77m', 'igmmx_U_58m', ])
    print df # see what the data looks like
    outfile = open('DataB_GreaterGabbardOnly.csv','wb')#somewhere to write the data to
    df.to_csv(outfile)#save selection to the blank .csv created above