IBM Support

Troubleshooting native OutOfMemory (OOM) error caused by a thread leak (unbounded number of threads)

Troubleshooting


Problem

When a javacore is generated by an OOM error,  most often it shows that the OOM error is due to "Java heap space" (i.e. ran out of heap space),  yet occasionally we see the cause: "Failed to create a thread" and the javacore will contain something like this:
...
Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Failed to create a thread: retVal -1073741830, errno 11" received
...

 

Symptom


When the JVM reaches OutOfMemory (OOM) with error "Failed to create a thread" that is a typical sign of running out of native memory (as opposed to running out of Java heap space).

Cause

There could be many causes for native OOM;  this technote addresses only one possible cause of native OOM,  specifically when it is caused by a thread leak,  i.e. when the application is creating new thread pools but does not ensure they are shutdown once their job is finished.

Diagnosing The Problem

A thread leak can be diagnosed by analyzing one of the OOM generated javacores.
1) First obvious sign would be the size of the javacore.  Usually a javacore size is around 4-5MB,  but if there is a thread leak,
   then the javacore's size can grow to many tens of megabytes,  e.g. 30MB, 40MB even 50MB are not unusual.
   (Essentially it can grow till the operating system runs out of memory and native OOM occurs.)
2) The size of the javacore alone is not enough to diagnose the cause of the native OOM,  but reviewing the javacore's content by searching/counting how many threads are found will be more conclusive. E.g. in a 50MB javacore I recently reviewed there were over 30000 threads...
   (Just to get an idea,  the WebSphere's WebContainer thread pool default size is 50,  and that is usually sufficient for many applications...)
   Note: If one cannot wait till native OOM actually occurs,  usually it is enough to capture a few javacores over a long period of time and check if their size and number of threads within each javacore is growing unbounded over time ...   If yes, then that would indicate a thread leak.
Example:
--------
This is an application thread creating a new Executor thread out of a ThreadPoolExecutor pool of threads:
"SearchThread-1" ...
         Java callstack:
              at java/lang/Thread.startImpl(Native Method)
              at java/lang/Thread.start(Thread.java:948(Compiled Code))
              at java/util/concurrent/ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:961(Compiled Code))
              at java/util/concurrent/ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368(Compiled Code))
              ...

Resolving The Problem

Solution:
--------
Application has to ensure it calls shutdown() on the Executor thread pool once all Executor threads finished their job.
Here is the relevant excerpt from Javadoc for: "public static ExecutorService newFixedThreadPool(...)"
"The threads in the pool will exist until it is explicitly shutdown."
"Class Executors"
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html
Note: If there is need to identify which code is creating these objects one can set following JVM generic argument to generate a stack trace every time a ThreadPoolExecutor is created:  (the thread stack will be recorded in the native_stderr.log file)
-Xtrace:iprint=mt,trigger=method{java/util/concurrent/ThreadPoolExecutor.<init>,jstacktrace}

Document Location

Worldwide

[{"Type":"SW","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"ARM Category":[{"code":"a8m50000000CdAmAAK","label":"OutOfMemory->Native"}],"ARM Case Number":"TS005358711","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Historical Number

TS005328366;TS005358711

Document Information

Modified date:
08 April 2021

UID

ibm16441095