I am trying to insert data from xlsx file into mysqdl table. I want to insert data in table and if there is a duplicate on primary keys, I want to update the existing data otherwise insert. I have written the script already but I realized it is too much work and using pandas it is quick. How can I achieve it in pandas?
import pandas as pd
engine_str = 'mysql+pymysql://admin:mypass@localhost/mydb'
engine = sqlalchemy.create_engine(engine_str, echo=False, encoding='utf-8')\
file_name = "tmp/results.xlsx"
df = pd.read_excel(file_name)
I can think of two options, but number 1 might be cleaner/faster:
1) Make SQL decide on the update/insert. Check this other question. You can iterate by rows of your 'df', from
n. Inside the loop for the insertion you can write something like:
query = """INSERT INTO table (id, name, age) VALUES(%s, %s, %s) ON DUPLICATE KEY UPDATE name=%s, age=%s""" engine.execute(query, (df.id[i], df.name[i], df.age[i], df.name[i], df.age[i]))
2) Define a
python function that returns
False when the record exists and then use it in your loop:
def check_existence(user_id): query = "SELECT EXISTS (SELECT 1 FROM your_table where user_id_str = %s);" return list(engine.execute(query, (user_id, ) ) ) == 1
You could iterate over rows and do this check before inserting