JAR.JAR.beans JAR.JAR.beans - 10 months ago 47
Linux Question

Track down high CPU load average

Trying to understand what's going on with my server.
It's a 2 cpu server, so:

$> grep 'model name' /proc/cpuinfo | wc -l

While on load avergae, queue is showing ~8 :

$> uptime
16:31:30 up 123 days, 9:04, 1 user, load average: 8.37, 8.48, 8.55

So You can assume, load is really high and things are pailing up, there is some load on the system and it's not just a spike.
However, Looking at top cpu consumers:

> ps -eo pcpu,pid,user,args | sort -k 1 -r | head -6
8.3 27187 **** server_process_c
1.0 22248 **** server_process_b
0.5 22282 **** server_process_a
0.0 31167 root head -6
0.0 31166 root sort -k 1 -r
0.0 31165 root ps -eo pcpu,pid,user,args

Results of free command:

total used free shared buffers cached
Mem: 7986 7934 52 0 9 2446
-/+ buffers/cache: 5478 2508
Swap: 17407 60 17347
This is the result on an ongoing basis, e.g. not even

a single CPU is being used, top consumer, is always ~8.5%.

My Question: What are my ways to track down the root of the high load?

Answer Source

Based on your free output, there are times when system memory is exhausted so swap buffer is used (see column used = 60). Total memory used used - (buffers + cached) which result almost zero. It means there are time when all physical RAM is consumed.

For server, try to avoid page fault which may cause swapping data from system memory to swap buffer (or vice versa) as much as possible because accessing hard drive is very slow than system RAM.

In your top output, try to investigate wa column. Higher percentage value means CPU spend more times waiting for data IO from disk rather than doing meaningful computation.

Cpu(s): 87.3%us,  1.2%sy,  0.0%ni, 27.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Try to reduce daemon or service that you do not need to reduce memory footprint and consider to add more RAM to the system.

For 2 CPU(s) server, ideal load is less than 2.0 (each CPU load is less than 1.0). Load of 8.0 means each CPU load is roughly 4.0 which is not very good.