Windalfin Windalfin - 4 months ago 27
SQL Question

pyodbc SQL Query to Numpy array typeerror: bytes-like object required

I've been trying to pull data from SQL database using pyodbc and want to place it into numpy.array. However I found difficulty in inputting multiple data-type for np.fromiter() argument.

import pyodbc as od
import numpy as np

con = od.connect('DSN=dBASE; UID=user; PWD=pass')
cursor = con.cursor()
SQLCommand = (
"""
SELECT

[Item No_]
,sum ([Quantity]) as TotQty
,sum ([Discount Amount]) as DiscAmount
,sum ([Cost Amount]) as CostAmount
,[Date]
,sum ([Net Amount]) as NetAmount
,sum ([VAT Amount]) as VATAmount
,sum ([Refund Qty_]) as RefundQty


FROM database
where [DATE] between ('2015-12-01 00:00:00.000') and ('2015-12-31 00:00:00.000') and [area No_] = '123'
group by ROLLUP([DATE],[Item No_]);

""")

cursor.execute(SQLCommand)
results = cursor.fetchall()
results_as_list = [i[0] for i in results]
array = np.fromiter(results_as_list, dtype="str, float, float, datetime64,float,float,float")
print(array[:5,:])


And I get this error

TypeError: a bytes-like object is required, not 'str'

Answer

You are attempting to pass one iterable, specifically the first column of query, with multiple dtypes. Numpy.iter() takes only one object and type. Consider passing each column of query resultset as a one-dimensional iterable. Also for string (variable-length) you need to specify length and for date include the suffix if [D] date only or with time [s].

cursor.execute(SQLCommand)
results = cursor.fetchall()

array = [np.fromiter([i[0] for i in results], dtype="|S50"),
         np.fromiter([i[1] for i in results], dtype="float"),
         np.fromiter([i[2] for i in results], dtype="float"),
         np.fromiter([i[3] for i in results], dtype="float"),
         np.fromiter([i[4] for i in results], dtype="datetime64[s]"),
         np.fromiter([i[5] for i in results], dtype="float"),
         np.fromiter([i[6] for i in results], dtype="float"),
         np.fromiter([i[7] for i in results], dtype="float")]

Alternatively, created an array of matrices:

array = np.array([np.matrix([i[0] for i in results], dtype="str"),
                  np.matrix([i[1] for i in results], dtype="float"),
                  np.matrix([i[2] for i in results], dtype="float"),
                  np.matrix([i[3] for i in results], dtype="float"),
                  np.matrix([i[4] for i in results], dtype="str"),
                  np.matrix([i[5] for i in results], dtype="float"),
                  np.matrix([i[6] for i in results], dtype="float"),
                  np.matrix([i[7] for i in results], dtype="float")])