askullhead askullhead - 2 years ago 113
Java Question

Can the JVM recover from an OutOfMemoryError without a restart

  1. Can the JVM recover from an OutOfMemoryError without a restart if it gets a chance to run the GC before more object allocation requests come in?

  2. Do the various JVM implementations differ in this aspect?

My question is about the JVM recovering and not the user program trying to recover by catching the error. In other words if an OOME is thrown in an application server (jboss/websphere/..) do I have to restart it? Or can I let it run if further requests seem to work without a problem.

Answer Source

It may work, but it is generally a bad idea. There is no guarantee that your application will succeed in recovering, or that it will know if it has not succeeded. For example:

  • There really may be not enough memory to do the requested tasks, even after taking recovery steps like releasing block of reserved memory. In this situation, your application may get stuck in a loop where it repeatedly appears to recover and then runs out of memory again.

  • The OOME may be thrown on any thread. If an application thread or library is not designed to cope with it, this might leave some long-lived data structure in an incomplete or inconsistent state.

  • If threads die as a result of the OOME, the application may need to restart them as part of the OOME recovery. At the very least, this makes the application more complicated.

  • Suppose that a thread synchronizes with other threads using notify/wait or some higher level mechanism. If that thread dies from an OOME, other threads may be left waiting for notifies (etc) that never come ... for example. Designing for this could make the application significantly more complicated.

In summary, designing, implementing and testing an application to recover from OOMEs can be difficult, especially if the application (or the framework in which it runs, or any of the libraries it uses) is multi-threaded. It is a better idea to treat OOME as a fatal error.

See also my answer to a related question:

EDIT - in response to this followup question:

In other words if an OOME is thrown in an application server (jboss/websphere/..) do I have to restart it?

No you don't have to restart. But it is probably wise to, especially if you don't have a good / automated way of checking that the service is running correctly.

The JVM will recover just fine. But the application server and the application itself may or may not recover, depending on how well they are designed to cope with this situation. (My experience is that some app servers are not designed to cope with this, and that designing and implementing a complicated application to recover from OOMEs is hard, and testing it properly is even harder.)


In response to this comment:

"other threads may be left waiting for notifies (etc) that never come" Really? Wouldn't the killed thread unwind its stacks, releasing resources as it goes, including held locks?

Yes really! Consider this:

Thread #1 runs this:

    synchronized(lock) {
         while (!someCondition) {
    // ...

Thread #2 runs this:

    synchronized(lock) {
         // do stuff

If Thread #1 is waiting on the notify, and Thread #2 gets an OOME in the // do something section, then Thread #2 won't make the notify() call, and Thread #1 may get stuck forever waiting for a notification that won't ever occur. Sure, Thread #2 is guaranteed to release the mutex on the lock object ... but that is not sufficient!

If not the code ran by the thread is not exception safe, which is a more general problem.

"Exception safe" is not a term I've heard of (though I know what you mean). Java programs are not normally designed to be resilient to unexpected exceptions. Indeed, in a scenario like the above, it is likely to be hard to impossible to make the application exception safe.

You'd need some mechanism whereby the failure of Thread #1 (due to the OOME) gets turned into an inter-thread communication failure notification to Thread #2. Erlang does this ... but not Java. The reason they can do this in Erlang is that Erlang processes communicate using strict CSP-like primitives; i.e. there is no sharing of data structures!

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download