Saturday 5 May 2018

OS tools for java performance monitoring

In this blog i will go through some of the Unix OS tools that can be used to monitor health of application.

Lets first define what is performance monitoring before we get to tools.

Performance monitoring is steps of collecting performance data from running application or operating system. Upfront monitoring is reactive step where development team try to identify bottle neck before it reported by customers.

There are couple of things that can be become bottle neck
  • CPU
  • Memory
  • Disk/IO
  • Network
  • System Calls

Unix has lots of tools that will help in looking at above parameters. Lets start

Monitor CPU usage

Some basic info is required before you start looking at CPU profiling
  • Number of virtual cores
  • CPU frequency
  • Current speed at which it is running etc
  • or want to set to specific frequency before testing.
cpufreq-info is the good tool to start with, it has every thing that is required.

top is number 1 tool that is used to monitor CPU usage of system, it produces ordered list of running process and updates it periodically. It shows which process is consuming most resource. Some time top can be listed as top process!

Top Command output

It has 2 sections
  • Total CPU usage, memory usage
  • Individual process usage.
top works on snapshot, so there is a chance that problematic application might not get listed if it generates load between snapshots and top it self is very resource consuming. This is classic problem with any profiling tools.

Other better and light weight options are Htop, which is graphical and shows much more information.
It has info about which cores are busy, in this sample core number 5 is very busy.

Htop Output

top/Htop will highlight CPU intensive application, now next interesting thing to know will what is load on CPU scheduler.

This takes us to another tool Vmstat

vmstat tool display summary information about operating system memory, processes, interrupts, paging and block I/O.
VM Stat output

vmstat has few sections like "procs", this section will help in figuring out if system is saturated with work.
"r" - Shows run queue, how many threads are ready to run but waiting for CPU to schedule it.
"b" - Shows block/sleep queue, threads that are in uninterruptible sleep.

High value for run queue is indication that system is saturated, value should not be more than number of virtual cores available. Execute TopApplication to see CPU saturation.

There are couple of ways to fix this issues
  • Get more CPU, if possible.
  • Reduce number of active threads
  • Identify the hotspot in code and fix it.

Monitor CPU - Context Switch

Next comes pidstat, it shows individual process level details like threads, I/O, context switch, memory usage , page fault etc

pidstat -t -d -r -w -h -p <pid> 5

t - Thread associated with task
d - Disk activity
r - Page fault & memory usage
w - Context switch

Context switches are very expensive and it is one of the top reason why all Cores are not fully used. vmstat shows context switch but it is across all cores.

Output from vmstat, 5th column from right shows context switch.

ashkrit@ashkrit-Inspiron-7520:~$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 2840132  90168 1793480    0    0    22    17  269  670  9  2 89  0  0
 1  0      0 2828648  90168 1803420    0    0     0    94 1711 4833  2  1 97  0  0
 0  0      0 2839472  90192 1793148    0    0     0    36 1227 3403  1  1 98  0  0
 1  0      0 2842596  90200 1788956    0    0     0     8 1158 3072  1  1 98  0  0

pidstat shows context switch cost at process and core level, lets run some experiment to measure cost of context switch.

ContextSwitchApp program has context switch issues,execute this program and see some context switch numbers. This program runs 4 thread to do some processing.

Output of - pidstat -w -I -t -p 7729 5

pidstat output

Highlighted part contains context switch details over 5 second for 4 threads.

Highlighted part contains context switch(i.e output of vmstat) across cores.

Modern processor can execute billions of instruction but context switch can take away all these power.
Per context switch application looses around 80K instruction.

Lets do simple math to figure out how much instruction is lost due to Context Switch.

So CPU is executing 23% less instruction due to context switch. This is very good indication that context switches are bad for application performance.This context switch is due to contention between threads, reducing number of thread will improve the performance.

More detail profiling is required to identify the contention point, Solaris Studio is best opensource tool to do such profiling.

Monitor Memory - Swap in/out

Memory is next thing to monitor, application will run slow if swapping activity is high, each OS has area called Swap Space, this space is allocated on disk.
Swap Space is used when physical memory is exhausted by application, the OS will swap part of the application, this also puts additional load in GC algorithm.

  vmstat is the tool to get all the details of memory.

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0 830976 3207508  31392 993460    0    0     0    50 1111 3705  3  1 97  0  0
 1  0 830976 3234744  31408 993272    5    0     5   180 1127 3713  3  1 96  0  0
 5  0 830976 3230156  31416 996480    3    0    19    62 1209 4281  4  1 95  0  0
 2  0 830720 3229304  31424 996484    2    0     2    27 1196 4509  3  1 96  0  0
 1  0 830720 3229304  31432 996452    2    0     2     2 1237 4693  4  1 95  0  0

 0  0 830720 3228296  31440 996504    0    0     0    14 1538 4807  4  1 95  0  0

swpd: the amount of virtual memory used.
free: the amount of idle memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.

si: Amount of memory swapped in from disk (/s).
so: Amount of memory swapped to disk (/s).

Lets look at sample vmstat output when this is happening.

Swapping start when free memory comes down to 141 MB, look at si and so column.
OS is put under load by application, it is under memory pressure, as physical memory is exhausted OS starts Swap Out.

Let look at graph to get better understanding.

As system is under memory constraint swapping in/out starts and then it returns back to normal when application end. This becomes worse when GC comes in action because it has to do swapping for GC cycles.

Run multiple instance of SwapApplication to see some numbers.

IO Activity

IO is one of the major bottleneck of many application, understanding what type of load application puts helps in building application that does not work against Disk.

You get complain that system is running slow and you suspect that disk could be the reason, now how to confirm that. How to check disk utilization ?

atop is wonderful tool for this, run atop -d to check disk utilization. You need admin privilege to run this.

On healthy system out of atop will be


Atop output - when disk utilization is high.

Run Diskusage program to generate disk load.

I went through just few tools but unix has tons of tools to figure out what application is doing and lot can be figured without looking at code.

All the code used in this post is available at osmonitor github repo. 

1 comment:

  1. Looks Pretty Good and at same time this post is very informative.Universal Garbage Collection log analyzer that parses any format of Garbage collection logs and generates WOW graphs & AHA metrics. Inbuilt intelligence has ability to discover any sort of memory problems.Excellence & Simplicity Devops tools for cloud.Analyse Sudden CPU Spikes