Alejandro Alejandro - 2 months ago 21
Python Question

Export 2Gb+ SELECT to CSV with Python (out of Memory)

I'm trying to export a large file from Netezza (using Netezza ODBC + pyodbc), this solution throws memoryError, If I loop without "list" it's VERY slow. do you have any idea of a intermediate solution that doesn't kill my server/python process but can run faster?

cursorNZ.execute(sql)
archi = open("c:\test.csv", "w")
lista = list(cursorNZ.fetchall())
for fila in lista:
registro = ''
for campo in fila:
campo = str(campo)
registro = registro+str(campo)+";"
registro = registro[:-1]
registro = registro.replace('None','NULL')
registro = registro.replace("'NULL'","NULL")
archi.write(registro+"\n")


---- Edit ----

Thank you, I'm trying this:
Where "sql" is the query,
cursorNZ is

connMy = pyodbc.connect(DRIVER=.....)
cursorNZ = connNZ.cursor()

chunk = 10 ** 5 # tweak this
chunks = pandas.read_sql(sql, cursorNZ, chunksize=chunk)
with open('C:/test.csv', 'a') as output:
for n, df in enumerate(chunks):
write_header = n == 0
df.to_csv(output, sep=';', header=write_header, na_rep='NULL')


Have this:
AttributeError: 'pyodbc.Cursor' object has no attribute 'cursor'
Any idea?

Answer

Don't use cursorNZ.fetchall().

Instead, loop through the cursor directly:

with open("c:/test.csv", "w") as archi:  # note the fixed '/'
    cursorNZ.execute(sql)
    for fila in cursorNZ:
        registro = ''
        for campo in fila:
            campo = str(campo)
            registro = registro+str(campo)+";"
        registro = registro[:-1]
        registro = registro.replace('None','NULL')
        registro = registro.replace("'NULL'","NULL")
        archi.write(registro+"\n")

Personally, I would just use pandas:

import pyodbc
import pandas

cnn = pyodbc.connect(DRIVER=.....)
chunksize = 10 ** 5  # tweak this
chunks = pandas.read_sql(sql, cnn, chunksize=chunksize)

with open('C:/test.csv', 'a') as output:
    for n, df in enumerate(chunks):
        write_header = n == 0
        df.to_csv(output, sep=';', header=write_header, na_rep='NULL')