jsignell jsignell - 1 year ago 244
Python Question

Faster way to read Excel files to pandas dataframe

I have a 14MB Excel file with five worksheets that I'm reading into a Pandas dataframe, and although the code below works, it takes 9 minutes!

Does anyone have suggestions for speeding it up?

import pandas as pd

def OTT_read(xl,site_name):
df = pd.read_excel(xl.io,site_name,skiprows=2,parse_dates=0,index_col=0,
return df

def make_OTT_df(FILEDIR,OTT_FILE):
xl = pd.ExcelFile(FILEDIR + OTT_FILE)
site_names = xl.sheet_names
df_list = [OTT_read(xl,site_name) for site_name in site_names]
return site_names,df_list

site_names_OTT,df_list_OTT = make_OTT_df(FILEDIR,OTT_FILE)

Answer Source

As others have suggested, csv reading is faster. So if you are on windows and have Excel, you could call a vbscript to convert the Excel to csv and then read the csv. I tried the script below and it took about 30 seconds.

# create a list with sheet numbers you want to process
sheets = map(str,range(1,6))

# convert each sheet to csv and then read it using read_csv
from subprocess import call
for sheet in sheets:
    csv = 'C:\\Users\\rsignell\\test' + sheet + '.csv' 
    call(['cscript.exe', 'C:\\Users\\rsignell\\ExcelToCsv.vbs', excel, csv, sheet])

Here's a little snippet of python to create the ExcelToCsv.vbs script:

#write vbscript to file
vbscript="""if WScript.Arguments.Count < 3 Then
    WScript.Echo "Please specify the source and the destination files. Usage: ExcelToCsv <xls/xlsx source file> <csv destination file> <worksheet number (starts at 1)>"
End If

csv_format = 6

Set objFSO = CreateObject("Scripting.FileSystemObject")

src_file = objFSO.GetAbsolutePathName(Wscript.Arguments.Item(0))
dest_file = objFSO.GetAbsolutePathName(WScript.Arguments.Item(1))
worksheet_number = CInt(WScript.Arguments.Item(2))

Dim oExcel
Set oExcel = CreateObject("Excel.Application")

Dim oBook
Set oBook = oExcel.Workbooks.Open(src_file)

oBook.SaveAs dest_file, csv_format

oBook.Close False

f = open('ExcelToCsv.vbs','w')

This answer benefited from Convert XLS to CSV on command line and csv & xlsx files import to pandas data frame: speed issue

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download