Michael Perdue Michael Perdue - 2 months ago 11
Python Question

pandas: problems with merging dataframes

I'm trying to merge the two following dataframes

on=SICcode
:

df.head(5)

SICcode Catcode Category SICname MultSIC
0 111 A1500 Wheat, corn, soybeans and cash grain Wheat X
1 112 A1600 Other commodities (incl rice, peanuts) Rice X
2 115 A1500 Wheat, corn, soybeans and cash grain Corn X
3 116 A1500 Wheat, corn, soybeans and cash grain Soybeans X
4 119 A1500 Wheat, corn, soybeans and cash grain Cash grains X

df.columns.tolist()

['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']

merged.head()


2012 NAICS Code 2002to2007 NAICS SICcode
0 111110 111110 116
1 111120 111120 119
2 111130 111130 119
3 111140 111140 111
4 111150 111150 115

merged.columns.tolist()
['2012 NAICS Code', '2002to2007 NAICS', 'SICcode']


When I try to merge them with the following code:

merged=pd.merge(merged,df, how='left', on='SICcode')


I get a
Keyerror: 'SICcode'
I tried to set the
dtype
of One of the dfs but When I do, I receive a
Keycode error
.

If anyone has an idea on this or would request more information please let me know.

Answer

pay attention at the first column:

In [27]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0)

In [28]: df.columns.tolist()
Out[28]: ['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']

In [29]: df['SICcode']

...

KeyError: 'SICcode'

In [30]: df['\ufeffSICcode'].head()
Out[30]:
0    111
1    112
2    115
3    116
4    119
Name: SICcode, dtype: int64

as @unutbu has said in his comment, adding encoding='utf-8_sig' to the pd.read_csv() call might help you to fix this problem:

In [31]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0, encoding='utf-8_sig')

In [32]: df.columns.tolist()
Out[32]: ['SICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']