AP257 AP257 - 4 days ago 5x
Python Question

SQLAlchemy: What's the difference between flush() and commit()?

What the difference is between

in SQLAlchemy?

I've read the docs, but am none the wiser - they seem to assume a pre-understanding that I don't have.

I'm particularly interested in their impact on memory usage. I'm loading some data into a database from a series of files (around 5 million rows in total) and my session is occasionally falling over - it's a large database and a machine with not much memory.

I'm wondering if I'm using too many
and not enough
calls - but without really understanding what the difference is, it's hard to to tell!


Edited to reflect comments by zzzeek

A Session object is basically an ongoing transaction of changes to a database (update, insert, delete). These operations aren't persisted to the database until they are committed (if your program aborts for some reason in mid-session transaction, any uncommitted changes within are lost).

The session object registers transaction operations with session.add(), but doesn't yet communicate them to the database until session.flush() is called.

session.flush() communicates a series of operations to the database (insert, update, delete). The database maintains them as pending operations in a transaction. The changes aren't persisted permanently to disk, or visible to other transactions until the database receives a COMMIT for the current transaction (which is what session.commit() does).

session.commit() commits (persists) those changes to the database.

flush() is always called as part of a call to commit() (1).

When you use a Session object to query the database, the query will return results both from the database and from the flushed parts of the uncommitted transaction it holds. By default, Session objects autoflush their operations, but this can be disabled.

Hopefully this example will make this clearer:

s = Session()

s.add(Foo('A')) # The Foo('A') object has been added to the session.
                # It has not been committed to the database yet,
                #   but is returned as part of a query.
print 1, s.query(Foo).all()

s2 = Session()
s2.autoflush = False

print 2, s2.query(Foo).all() # The Foo('B') object is *not* returned
                             #   as part of this query because it hasn't
                             #   been flushed yet.
s2.flush()                   # Now, Foo('B') is in the same state as
                             #   Foo('A') was above.
print 3, s2.query(Foo).all() 
s2.rollback()                # Foo('B') has not been committed, and rolling
                             #   back the session's transaction removes it
                             #   from the session.
print 4, s2.query(Foo).all()

1 [<Foo('A')>]
2 [<Foo('A')>]
3 [<Foo('A')>, <Foo('B')>]
4 [<Foo('A')>]