Boa Boa - 2 months ago 7
Python Question

Faster way to perform bulk insert, while avoiding duplicates, with SQLAlchemy

I'm using the following method to perform a bulk insert, and to optionally avoid inserting duplicates, with SQLAlchemy:

def bulk_insert_users(self, users, allow_duplicates = False):
if not allow_duplicates:
users_new = []

for user in users:
if not self.SQL_IO.db.query(User_DB.id).filter_by(user_id = user.user_id).scalar():
users_new.append(user)

users = users_new

self.SQL_IO.db.bulk_save_objects(users)
self.SQL_IO.db.commit()


Can the above functionality be implemented such that the function is faster?

Answer

You can load all user ids first, put them into a set and then use user.user_id in existing_user_ids to determine whether to add a new user or not instead of sending a SELECT query every time. Even with ten thousands of users this will be quite fast, especially compared to contacting the database for each user.