chintan s chintan s - 4 months ago 26
Python Question

Pandas: read file with special characters in a column

In the data that I have, some feature values are

?
. How do I replace them with
NA
?

EDIT

The code and output is as below:

df = pd.read_csv("cca-census-income.csv", header = None)

df.replace('?', np.nan, inplace=True)

df.ix[0,]

23 Other relative of householder
24 1700.09
25 ?
26 ?
27 ?
28 Not in universe under 1 year old
29 ?
30 0

Answer

Add parameter na_values='?' to read_csv.

Sample:

import pandas as pd
import io


temp=u"""Date Time,a
2010-01-27 16:00:00,?
2010-01-27 16:10:00,2.2
2010-01-27 16:30:00,1.7"""

df = pd.read_csv(io.StringIO(temp),na_values='?')
print (df)
             Date Time    a
0  2010-01-27 16:00:00  NaN
1  2010-01-27 16:10:00  2.2
2  2010-01-27 16:30:00  1.7

EDIT:

Thank you 'shivsn' for suggestion add skipinitialspace=True:

temp=u"""Date Time,a
 ? , ?
? ,?
2010-01-27 16:30:00,1.7"""

df = pd.read_csv(io.StringIO(temp),na_values=['?', '? '], skipinitialspace =True)
print (df)
             Date Time    a
0                  NaN  NaN
1                  NaN  NaN
2  2010-01-27 16:30:00  1.7

EDIT1 by file:

It looks like there is only space before ?:

df = pd.read_csv('census-income.data', 
                 header = None, 
                 na_values=['?'], 
                 skipinitialspace =True)
print (df)
Comments