user2628641 user2628641 - 11 months ago 85
Python Question

psycopg2: copy_expert csv file containing e+ integer number

First, let me show how to reproduce the problem. It's simple setup.

here is my simple test csv file


the test csv file is generated by pandas dataframe's to_csv method

import csv, pandas as pd

d = {'a' : pd.Series([1, 0.0, 1.5637e+07]),'b' : pd.Series([1, 2, 3])}

df = pd.DataFrame(d)

df_csv = df.to_csv('test.csv', encoding="utf-8", index=False,
sep=';', quotechar='"', float_format="%g", quoting=csv.QUOTE_MINIMAL)

then, let's create a test table in db

create table testtb (
a integer,
b integer

Finally, I tried to export the csv file into that table, some set up code here

def parse_column(header):
column_str = ",".join(header[0].split(";"))
return column_str

conn = psycopg2.connect("dbname=test1 " +
" user=postgres" +
" host=localhost" +
" password=password")

cur = conn.cursor()

with open('test.csv', "r") as f:
reader = csv.reader(f)
header = next(reader)

copy_sql = """
""".format("public", "simplew", parse_column(header))
cur.copy_expert(sql=copy_sql, file=f)


The problem is, now I get

invalid input syntax for integer: "1.5637e+07"

There are few constraints for solution here

  1. I can't use to_sql method of dataframe, I am doing bulk copying here, to_sql is too slow..

  2. because of No.1, I have to follow the procedure I laid out above, to_csv, then copy the csv to DB, which means I can't use any code to format "1.5637e+07", except setting float_format in the to_csv method..

  3. about the float_format, I need to get rid of trailing zero of float number, so I set it to %g currently

Answer Source

1.5637e+07 isn't an integer. It's a float

>>> 1.5637e+07
>>> type(1.5637e+07)
<class 'float'>

If you want an integer, just do 15637*10**3 or simply 15637000. That will be an integer.

pd.Series([1, 0.0, 15637000])

Note: doing int(1.5637e+07) would also work here, but would fail with loss of precision for highest 10th powers: (int(1.5637e+22) = 15637000000000000786432) so better stick to integer exponentiation, unlike C integers, python integers can handle unlimited values, well memory-limited at most)