Monday 18 February 2013

Experiment with ConcurrentHashmap

I am investigating memory issue in one of my recent project where data is kept in memory for fast access, but memory footprint of application is very high.
This application was heavily using CHM(i.e Concurrenthashmap), so no brainier guess was required that CHM was the issue. I did memory profiling session to find how much really CHM was taking.
I was surprised with result that CHM was taking around 50% of the memory. So it was confirmed that CHM was the issue, it is fast but not memory efficient.

Why CHM is so fat ?

  • Key/Values are wrapped by Map.Entry object , this create extra object per object. 
  • Each segment is Re-entrant lock, so if you have lot of  small CHM and concurrency level is default then there will be lot of lock object and that will come in top list. 
  • and many more states for house keeping activities.

All of above objects makes very good contribution to memory.

How can we reduce memory footprint
If is difficult to reduce memory footprint of CHM and possible reason that i can think of are

  • It has to support old interface of Map 
  • hash code collision technique used by java map are closed, closed hashing is based on creating Linklist on collision, closed hashing is very fast for resolution, but it is not CPU cache friendly, especially if nodes turns in big link list. There is interesting article about LinkList problem 

So we need alternate CHM implementation which is memory efficient.

Version 2 of CHM

I started to create version of CHM which has low memory foot print, target is come a close as array. I also used alternate hash code collision techniques to check the performance, their many options for Open_addressing.
I tried below options.

  • Linear probing - performance was not that great, although this is most cpu cache friendly. Need to spend some more time to get it right.
  • Double_hashing - Performance was in acceptable range.

Lets measure CHM V2 

  • Memory footprint

There is big gain in terms of memory, CHM takes around 45%+ more than raw data , new implementation LCHM is very much close to Array type.

  • Single thread PUT performance

CHM out performs in PUT test, new implementation is around 50 to 80 Millisecond slower for 1 Million items. 50 to 80 ms is not noticeable delay and i think it is good for application where latency requirement is  in seconds and in case if latency requirement is in millisecond/nanosecond then any way CHM will not be good choice.
Reason for slow performance of LCHM is hash collision technique, double hashing is used for resolving has code collision. 

  • Concurrent Add performance 

New implementation perform slightly better when multiple threads are used to write to map.
  • Get Performance

 Performance of get is slightly slower as compared to CHM.


New implementation out performs in memory test and it is bit slow in in get/put test, there are couple of things that can be done to improve performance of get/put and all the performance difference that we see is due to probing technique used.
  •  Probing technique can be improved, linear probing technique can used to to get cache friendly access. 
  • It is very easy to make probing technique parallel.