mattrweaver mattrweaver - 7 months ago 67
Python Question

pass openpyxl data to pandas

I am splitting "full name" fields into "first name", middle name" and "last name" fields from data from an excel file. I couldn't figure out how to do that in pandas, so I turned to openpyxl. I got the variables split as I desired. But, since adding columns to openpyxl for the new fields is not easy, I thought I would pass the values to pandas.

I'm generating the dataframe that I need when I run the code, but once I send the df to ExcelWriter, only the last row is added to the Excel file. The data is in the right places, though.

Here's the code:

for cellObj in range(2, sheet.max_row+1):
#print cellObj
id = sheet['A' + str(cellObj)].value
fullname = sheet['B' + str(cellObj)].value.strip()
namelist = fullname.split(' ')
for i in namelist:
firstname = namelist[0]
if len(namelist) == 2:
lastname = namelist[1]
middlename = ''
elif len(namelist) == 3:
middlename = namelist[1]
lastname = namelist[2]
elif len(namelist) == 4:
middlename = namelist[1]
lastname = namelist[2] + " " + namelist[3]
if (namelist[1] == 'Del') | (namelist[1] == 'El') | (namelist[1] == 'Van'):
middlename = ''
lastname = namelist[1] + " " + namelist[2]
df = pd.DataFrame({'personID':id,'lastName':lastname,'firstName':firstname,'middleName':middlename}, index=[id])

writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1', columns=['ID','lastName','firstName','middleName'])
writer.save()


Any ideas?

Thanks

Sam Sam
Answer

A couple of things. First, your code is only ever going to get you one line, because you overwrite the values every time it passes an if test. for example,

  if len(namelist) == 2:
        lastname = namelist[1]

This assigns a string to the variable lastname. You are not appending to a list, you are just assigning a string. Then when you make your dataframe, df = pd.DataFrame({'personID':id,'lastName':lastname,... your using this value, so the dataframe will only ever hold that string. Make sense? If you must do this using openpyexcel, try something like:

lastname = [] #create an empty list
if len(namelist) == 2:
    lastname.append(namelist[1]) #add the name to the list

However, I think your life will ultimately be much easier if you just figure out how to do this with pandas. It is in fact quite easy. Try something like this:

import pandas as pd
#read excel
df = pd.read_excel('myInputFilename.xlsx', encoding = 'utf8')
#write to excel
df.to_excel('MyOutputFile.xlsx')
Comments