I am investigating memory issue in one of my recent project where data is kept in memory for fast access, but memory footprint of application is very high.
This application was heavily using CHM(i.e Concurrenthashmap), so no brainier guess was required that CHM was the issue. I did memory profiling session to find how much really CHM was taking.
I was surprised with result that CHM was taking around 50% of the memory. So it was confirmed that CHM was the issue, it is fast but not memory efficient.
Why CHM is so fat ?
All of above objects makes very good contribution to memory.
How can we reduce memory footprint
If is difficult to reduce memory footprint of CHM and possible reason that i can think of are
So we need alternate CHM implementation which is memory efficient.
Version 2 of CHM
I started to create version of CHM which has low memory foot print, target is come a close as array. I also used alternate hash code collision techniques to check the performance, their many options for Open_addressing.
I tried below options.
Lets measure CHM V2
This application was heavily using CHM(i.e Concurrenthashmap), so no brainier guess was required that CHM was the issue. I did memory profiling session to find how much really CHM was taking.
I was surprised with result that CHM was taking around 50% of the memory. So it was confirmed that CHM was the issue, it is fast but not memory efficient.
Why CHM is so fat ?
- Key/Values are wrapped by Map.Entry object , this create extra object per object.
- Each segment is Re-entrant lock, so if you have lot of small CHM and concurrency level is default then there will be lot of lock object and that will come in top list.
- and many more states for house keeping activities.
All of above objects makes very good contribution to memory.
How can we reduce memory footprint
If is difficult to reduce memory footprint of CHM and possible reason that i can think of are
- It has to support old interface of Map
- hash code collision technique used by java map are closed, closed hashing is based on creating Linklist on collision, closed hashing is very fast for resolution, but it is not CPU cache friendly, especially if nodes turns in big link list. There is interesting article about LinkList problem
So we need alternate CHM implementation which is memory efficient.
Version 2 of CHM
I started to create version of CHM which has low memory foot print, target is come a close as array. I also used alternate hash code collision techniques to check the performance, their many options for Open_addressing.
I tried below options.
- Linear probing - performance was not that great, although this is most cpu cache friendly. Need to spend some more time to get it right.
- Double_hashing - Performance was in acceptable range.
Lets measure CHM V2
- Memory footprint
There is big gain in terms of memory, CHM takes around 45%+ more than raw data , new implementation LCHM is very much close to Array type.
- Single thread PUT performance
CHM out performs in PUT test, new implementation is around 50 to 80 Millisecond slower for 1 Million items. 50 to 80 ms is not noticeable delay and i think it is good for application where latency requirement is in seconds and in case if latency requirement is in millisecond/nanosecond then any way CHM will not be good choice.
Reason for slow performance of LCHM is hash collision technique, double hashing is used for resolving has code collision.
New implementation perform slightly better when multiple threads are used to write to map.
Performance of get is slightly slower as compared to CHM.
Conclusion
New implementation out performs in memory test and it is bit slow in in get/put test, there are couple of things that can be done to improve performance of get/put and all the performance difference that we see is due to probing technique used.
- Probing technique can be improved, linear probing technique can used to to get cache friendly access.
- It is very easy to make probing technique parallel.
nice
ReplyDeleteLinear probing is much better than double or triple hashing. I got ~20-30% better put performance using this.
ReplyDeleteSome remarks:
1) In real applications CHM/HM entries are more likely to get cluttered across the heap, so it might perform worse than compared to open adressing (issue with noncompacting CMS collector)
2) would be nice to also see object count as it has more impact than memory consumption in MB.
Linear probing was bit slow for add test, may be when hashtable becomes really big then it starts to pay-off.
ReplyDeleteThere were lot of Map.Entry object created, i have lost that stats now, most of the memory gain came from those Entry object that was not required in new hashtable.
Hi Ashkrit,
ReplyDeletecould you share your custom implementation of CHM
Hi Vamshi,
DeleteThat code was for one of previous employer, so right now i don't have that, but it pretty trival to write one.
I will soon write CHM based on linear probing and post it on github.
Meanwhile you can have look at trove to get idea about how linear probing works and adding concurrency to that should be straight forward.