inja inja - 3 months ago 7
Python Question

Pandas DataFrame- Finding Index Value for a Column

I have a DataFrame that has columns such as ID, Name, Specification, Time.

my file path to open them

mc = pd.read_csv("C:\\data.csv", sep = ",", header = 0, dtype = str)


When I checked my columns values, using

mc.coulumns.values


I found my ID had it with a weird character looked like this,

['/ufeffID', 'Name', 'Specification', 'Time']


After this I assigned that columns with ID like this,

mc.columns.values[0] = "ID"


When I checked this using

mc.columns.values


I got my result as,

Array(['ID', 'Name', 'Specification', 'Time'])


Then, I checked with,

"ID" in mc.columns.values


it gave me
"True"


Then I tried,

mc["ID"]


I got an error stating like this,

keyError 'ID'.


I want to get the values of ID column and get rid of that weird characters in front of ID column? Is there any way to solve that? Any help would be appreciated. Thank you in advance.

Answer

That's utf-16 BOM, pass encoding='utf-16' to read_csv see: https://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

mc = pd.read_csv("C:\\data.csv", sep = ",", header = 0, dtype = str, encoding=utf-16')

the above should work FE FF is the BOM for utf-16 Big endian to be specific

Also you should use rename rather than try to overwrite the np array value:

mc.rename(columns={mc.columns[0]: "ID"}, inplace=True)

should work correctly

Comments