mindas mindas - 29 days ago 7
Java Question

Safely clearing Hibernate session in the middle of large transaction

I am using Spring+Hibernate for an operation which requires creating and updating literally hundreds of thousands of items. Something like this:

{
...
Foo foo = fooDAO.get(...);
for (int i=0; i<500000; i++) {
Bar bar = barDAO.load(i);
if (bar.needsModification() && foo.foo()) {
bar.setWhatever("new whatever");
barDAO.update(bar);
// commit here
Baz baz = new Baz();
bazDAO.create(baz);
// if (i % 100 == 0), clear
}
}
}


To protect myself against losing changes in the middle, I commit the changes immediately after
barDAO.update(bar)
:

HibernateTransactionManager transactionManager = ...; // injected by Spring
DefaultTransactionDefinition def = new DefaultTransactionDefinition();
def.setPropagationBehavior(TransactionDefinition.PROPAGATION_REQUIRED);
TransactionStatus transactionStatus = transactionManager.getTransaction(def);
transactionManager.commit(transactionStatus);


At this point I have to say that entire process is running in a transaction wrapped into
org.springframework.orm.hibernate3.support.ExtendedOpenSessionInViewFilter
(yes, this is a webapp).

This all works fine with one exception: after few thousand of updates/commits, entire process gets really slow, most likely due to memory being bloated by ever-increasing amount of objects kept by Spring/Hibernate.

In Hibernate-only environment this would be easily solvable by calling
org.hibernate.Session#clear()
.

Now, the questions:


  • When is it a good time to
    clear()
    ? Does it have big performance cost?

  • Why aren't objects like
    bar
    or
    baz
    released/GCd automatically? What's the point of keeping them in the session after the commit (in the next loop of iteration they're not reachable anyway)? I haven't done memory dump to prove this but my good feeling is that they're still there until completely exited. If the answer to this is "Hibernate cache", then why isn't the cache flushed upon the available memory going low?

  • is it safe/recommended to call
    org.hibernate.Session#clear()
    directly (having in mind entire Spring context, things like lazy loading, etc.)? Are there any usable Spring wrappers/counterparts for achieving the same?

  • If answer to the above question is true, what will happen with object
    foo
    , assuming
    clear()
    is called inside the loop? What if
    foo.foo()
    is a lazy-load method?



Thank you for the answers.

Answer

When is it a good time to clear()? Does it have big performance cost?

At regular intervals, ideally the same as the JDBC batch size, after having flushed the changes. The documentation describes common idioms in the chapter about Batch processing:

13.1. Batch inserts

When making new objects persistent flush() and then clear() the session regularly in order to control the size of the first-level cache.

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

for ( int i=0; i<100000; i++ ) {
    Customer customer = new Customer(.....);
    session.save(customer);
    if ( i % 20 == 0 ) { //20, same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}

tx.commit();
session.close();

And this shouldn't have a performance cost, au contraire:

  • it allows to keep the number of objects to track for dirtiness low (so flushing should be fast),
  • it should allow to reclaim memory.

Why aren't objects like bar or baz released/GCd automatically? What's the point of keeping them in the session after the commit (in the next loop of iteration they're not reachable anyway)?

You need to clear() the session explicitly if you don't want to keep entities tracked, that's all, that's how it works (one might want to commit a transaction without "loosing" the entities).

But from what I can see, bar and baz instances should become candidate to GC after the clear. It would be interesting to analyze a memory dump to see what is happening exactly.

is it safe/recommended to call org.hibernate.Session#clear() directly

As long as you flush() the pending changes to not loose them (unless this is what you want), I don't see any problem with that (your current code will loose a create every 100 loop but maybe it's just some pseudo code).

If answer to the above question is true, what will happen with object foo, assuming clear() is called inside the loop? What if foo.foo() is a lazy-load method?

Calling clear() evicts all loaded instances from the Session, making them detached entities. If a subsequent invocation requires an entity to be "attached", it will fail.