Vitu Tomaz Vitu Tomaz - 1 month ago 25
Python Question

Efficiency: openpyxl or VBA?

I'm trying to figure out which one is generally faster for a similar task: using VBA or openpyxl.

I know it probably depends on the task you want to achieve, but let's say I have a table that is 50 cells wide and 150,000 cells tall and I want to copy it from woorkbook A to workbook B.

Any thoughts on whether python will do better or if Excel is better in dealing with itself?

My guts tell me that python should be fairly faster for some reasons:


  • In order for a sub to copy from a workbook to another, both should be open and running, whereas with python I can simply load both;

  • VBA has to deal with a lot of clutter with most tasks and it takes A LOT of system resources



Besides that, I'd like to know if I can make some further improvements to a openpyxl script, like multithreading or perhaps using NumPy along with it.

Thanks for the help!

Answer

TBH the fastest approach would probably be remote controlling Excel using xlwings, because this can take advantage of Excel's optimisation. VBA might be able to hook into that as well but I've never found VBA to be fast.

Python will have to convert from XML to Python and back to XML. You've got around 5,000,000 million cells so I'd expect this to take about a minute on my machine. I'd suggest combining read-only and write-only modes to do this to keep memory use low.

If you only have numerical data (no dates) then you might be able to find a shortcut and "transplant" the relevant worksheet XML file from one Excel file to another and just alter the relevant metadata.