little.Pig little.Pig - 3 months ago 27
Java Question

What is threshold limit number of threads that a JVM can create

Edit:

As what @Petesh said, I reached the

kern.num_taskthreads
limit rather than the overall thread limit, which limits the number of threads for an individual process.

The
sysctl kern.num_taskthreads
is:

kern.num_taskthreads: 2048


And when I used the VM args,
-XX:ThreadStackSize=1g
, I could only create 122 threads; with
-XX:ThreadStackSize=2g
, 58 threads was created. It's reasonable.

But it's still strange that no matter how I changed the
-Xss
args, the result is always 2031. The
-Xss
args seems only works for main thread which I'm not sure for now.

Original question:

I ran a test to find out how many threads that one JVM can create. And when I adjusted the JVM args,
-Xmx
and
-Xss
, the result didn't change.

Here is the code:

public class ThreadTest {
public static void main(String[] args) {
int count = 0;
try {
while (true) {
Thread thread = new Thread(new Runnable() {
@Override
public void run() {
try {
TimeUnit.SECONDS.sleep(360);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
});
thread.start();
System.out.println(count);
}

} catch (Error e) {
e.printStackTrace();
}
}
}


And the OS info:



  • Model Name: MacBook Pro

  • Model Identifier: MacBookPro11,4

  • Processor Name: Intel Core i7

  • Processor Speed: 2.2 GHz

  • Number of Processors: 1

  • Total Number of Cores: 4

  • L2 Cache (per Core): 256 KB

  • L3 Cache: 6 MB

  • Memory: 16 GB




The java version:

➤ java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Dynamic Code Evolution 64-Bit Server VM (build 25.71-b01-dcevmlight-1, mixed mode)


The result:
enter image description here

The
ulimit -a
:
enter image description here

The
sysctl kern.num_threads
:

kern.num_threads: 10240

Answer

All this stuff is OS specific - in the case of OSX, you've got a per-process thread limit that can't be exceeded from the sysctl kern.num_taskthreads. The limit in number of threads that you created and the overhead of VM created threads seems to indicate that you're reaching that limit.

The difference between -XX:ThreadStackSize and -Xss<size> is a bit odd. In this case I'm basing my analysis on the OSX oracle java vm (you're indicating that you're running with a different VM).

-Xss sets the stack size to that number of bytes. The variable storing it divides it by 1024. However because of the way it calculates it the value ends up as a meaningless value (64bit jvm, checked on linux and osx) - this is some vonderfully bad overflow math:

for i in {1..8}; do echo "${i}G:"; java -Xss${i}g -XX:+PrintFlagsFinal -version 2>&1 | grep ' ThreadStack'; done
1G:
     intx ThreadStackSize                          := 1048576                             {pd product}
2G:
     intx ThreadStackSize                          := 18014398507384832                    {pd product}
3G:
     intx ThreadStackSize                          := 18014398508433408                    {pd product}
4G:
     intx ThreadStackSize                          := 0                                   {pd product}
5G:
     intx ThreadStackSize                          := 1048576                             {pd product}
6G:
     intx ThreadStackSize                          := 18014398507384832                    {pd product}
7G:
     intx ThreadStackSize                          := 18014398508433408                    {pd product}
8G:
     intx ThreadStackSize                          := 0                                   {pd product}

When we compare this with -XX:ThreadStackSize we have a different picture:

Firstly, these values are scaled by a factor of 1024 - i.e. all values requested are actually a number of KB for the stack size.

This means that -XX:ThreadstackSize needs to be specified in a factor of 1024 down from the values from -Xss. The fact that you were only able to create a fraction of the number of threads, and the virtual memory size of the process makes this obvious (taken from the vmmap output of the process):

Stack                  0000000800004000-0000040800000000 [  4.0T] rw-/rwx SM=NUL  thread 23
Stack                  0000040800000000-0000040800003000 [   12K] rw-/rwx SM=PRV  thread 23

4TB per stack? That's going to hurt (this is what you'd previously asked for):

Once we adjust it down by a factor of 1024, we get the same number of threads in the second run - you can see these numbers far more clearly in the output and they linearly scale with the requested size:

for i in {1..8}; do echo "${i}G:"; java -XX:ThreadStackSize=${i}m -XX:+PrintFlagsFinal -version 2>&1 | grep ' ThreadStack'; done
1G:
     intx ThreadStackSize                          := 1048576                             {pd product}
2G:
     intx ThreadStackSize                          := 2097152                             {pd product}
3G:
     intx ThreadStackSize                          := 3145728                             {pd product}
4G:
     intx ThreadStackSize                          := 4194304                             {pd product}
5G:
     intx ThreadStackSize                          := 5242880                             {pd product}
6G:
     intx ThreadStackSize                          := 6291456                             {pd product}
7G:
     intx ThreadStackSize                          := 7340032                             {pd product}
8G:
     intx ThreadStackSize                          := 8388608                             {pd product}

So, it looks like using -Xss<size> is really only useful when you're looking for a stacksize of < 1GB; and if you're looking for a stacksize of > 1GB then you can specify it explicitly with -XX:ThreadStackSize.

Figuring out the overflow. The code that parses the Xss option:

julong long_ThreadStackSize = 0;
ArgsRange errcode = parse_memory_size(tail, &long_ThreadStackSize, 1000);

Then in an act of stellar muppetry it does:

FLAG_SET_CMDLINE(intx, ThreadStackSize,
                          round_to((int)long_ThreadStackSize, K) / K);

i.e. downcasts the long to an int, which it then passes to round_to. This takes a Register value, which is a 64bit value on the 64bit VM. So from what I can tell is that it the value you start with is:

0x80000000

Gets sign extended to:

0xFFFFFFFF80000000

This gets divided by 1024 (0x400):-

0x3FFFFFFFE00000 == 18,014,398,507,384,832

so you can see where the 2GB value in the prior script comes from.

I've logged a bug. The change needed in the source is rather than (int)long_ThreadStackSize it should be (Register)long_ThreadStackSize to keep the calculation correct.

Comments