Boa Boa - 10 months ago 51
Python Question

Faster way to perform bulk insert, while avoiding duplicates, with SQLAlchemy

I'm using the following method to perform a bulk insert, and to optionally avoid inserting duplicates, with SQLAlchemy:

def bulk_insert_users(self, users, allow_duplicates = False):
if not allow_duplicates:
users_new = []

for user in users:
if not self.SQL_IO.db.query( = user.user_id).scalar():

users = users_new


Can the above functionality be implemented such that the function is faster?

Answer Source

You can load all user ids first, put them into a set and then use user.user_id in existing_user_ids to determine whether to add a new user or not instead of sending a SELECT query every time. Even with ten thousands of users this will be quite fast, especially compared to contacting the database for each user.