I am searching for a way to list all synchronization calls of a running parallel java application, in order to detect scalability problems (in terms of threads/cores). To my understanding, each time a synchronized block is entered, the machine needs to synchronize caches. This affects all CPUs running (in several ways, like memory bandwidth), even if the running tasks are not blocked by entering a synchronized region.
Entering and leaving synchronized block is rather cheap operation unless there is a contention on this block. In uncontended case
synchronized is just an atomic CAS or almost a no-op if
UseBiasedLocking optimization succeeds. Though it looks possible to do a synchronization profiler using Instrumentation API, this won't make much sense.
The problem for multithreaded application is a contended synchronization. JVM has some internal counters to monitor lock contention (see this question). Or you can even write a simple ad-hoc tool to track all contended locks using JVMTI events.
However, not only locks can cause contention. Even non-blocking algorithms can suffer from competition for a shared resource. Here is a good example of such scalability problem. So, I would agree with @PeterLawrey that it's better to start with CPU profiler as it is usually more handy in finding performance problems.