Thursday, 25 December 2014

what is new in java8 ConcurrentHashMap

Java8 has lots of new features that will help in writing code that express its intent more clearly and in concise way.
ConcurrentHashMap has lots of nice features that reduces boiler plate code, most of the new features are parallel, so you don't have to worry about locks/threads etc when using CHM.

Smart Cache
One of the most common usage of CHM is to build cache and it has little concurrency issue that value for key should be computed once and it should provide all the "happens after" guarantee.

Pre JDK8 you have to do some thing like 

JDK8 has special function computeIfAbsent for such type of thing

Difference is obvious for prejdk8 future task is required and some other special handling to make sure it works fine, but JDK8 just call to computeIfAbsent with lamdba expression and your are done.
There is computeIfPresent function also to remap/refresh value in cache.

Merge values 
One of the most common problem with maps is when values of same keys needs to merged, one of the example is you have map of word & count.
Every time word is added to map and if it exists then count needs to be incremented

Example code pre JDK8

Example code in JDK8

All the noise that was there in JDK7 code is removed and code express its intent so nicely

 Aggregate/Reduce values

This is also very common use case where single value is derived from values in map for eg total number of words

sample code for pre JDK8

sample code for JDK8

Code is so so much clean, extra thing to note in jdk8 reduce function is first parameter which is the (estimated) number of elements needed for this operation to be executed in parallel.
So this function supports parallel reducing

All the collection of JDK8 has done improvement on iteration method, so application code will never have any loop, it is all controlled by underlying collection and due to which iteration can be done in parallel .

CHM has 2 types of foreach

others interesting functions are search

One thing to note is that all the parallel function are executed on common forkjoin pool, there is way by which you can provide your own pool to execute task.

Saturday, 15 November 2014

Recommend books @ agile singapore

I recently attended agilesingapore conference. This was my first agile conference, lot to take away.
Amazing speakers were invited, keynote talks were very amazing and inspirational.

One of the things that you get from such type of conference is books recommendation. Lots of books were recommended. I only remember few of them.

I want to put it somewhere so that i can start making my read list for next couple of months.

Mindset: The New Psychology of Success

Fearless Change: Patterns for Introducing New Ideas  

101 Things I Learned in Architecture School

The Timeless Way of Building

Software Craftsmanship: The New Imperative

The Laws of Simplicity (Simplicity: Design, Technology, Business, Life)

97 Things Every Programmer Should Know: Collective Wisdom from the Experts

Growing Object-Oriented Software, Guided by Tests

Extreme Programming Explained: Embrace Change

The Click Moment: Seizing Opportunity in an Unpredictable World

The Fifth Discipline: The Art & Practice of The Learning Organization

Joy, Inc.: How We Built a Workplace People Love

Lot of books to read i have to start taking timeout out from my so called busy schedule!

Friday, 10 October 2014

Factory Without IF-ELSE

Object Oriented Language has very powerful feature of Polymorphism, it is used to remove if/else or switch case in code.

Code without condition is easy to read. There are some places where you have to put them and one of such example is Factory/ServiceProvider class.
I am sure you have seen factory class with IF-ELSEIF which keeps on getting big.

In this blog i will share some techniques that you can be used to remove condition in factory class.

I will use below code snippet as example

This is first thing that comes to mind when you want to remove conditions. You get the feeling of framework developer!

This looks very simple but only problem is caller has to remember fully qualified class name and some time it could be issue.

Map can be used to to map actual class instance to some user friendly name

This also looks neat without overhead of reflection.

This is interesting one

This method is using enum method to remove condition, one of the issue is that you need Enum for each type. You don't want to create tons of them!
I personally like this method.

If-else or switch case makes code difficult to understand, we should try to avoid them as much as possible. Language construct should be used to avoid some of switch case.

We should try to code without IF-ELSE and that will force us to come up with better solution.

Wednesday, 24 September 2014

Working Effectively with Legacy Code

Recently i attended Working Effectively with Legacy Code course by Michael Feathers

It was very good learning experience , he talks about how to work with code that does not have unit test or less unit test. He shared techniques can be used to improve legacy code and get better understanding of application.

This post is to share some of them before i forget:-)

Sprout Method/Class
This is pretty common technique but did't know that it has name. Adding new method/class sounds much easier than changing some existing code for new feature, so we should use this approach for any new feature that we want to introduce.

This approach can be used on existing method also to make it testable.
Work of caution that you don't want over do it !

Poor Man dependency injection
Everybody knows what dependency injection is, apart from some of the benefit it can also be used to make code testable, so for unit test you can have sample/dummy implementation that can be injected to your application code to make is testable. 

One of the problem with this technique is that it will result in method signature changes and that might mean that all the code in that call stack might need to change.
Just remember that you don't have to use spring to do this!

Extract and override
This is interesting one looks like silver bullet or Brahmastra for many problem.This pattern is used to have control on dependency that are hard to fake.
Assume there is function that makes some database/socket call and then does some calculation and you want to write unit test for calculation logic.

To make function testable you can do below changes
  • Extract database part of logic and put that in new function
  • Make that function protected. Thanks to OOPs , finally some good usecase for protected.
  • Write new class that extends original class and only override database interaction function(i.e was protected)
  • Use new class for your testing and you are done!

This is very powerful technique because you don't have to go through pain of changing  constructor/method parameter, so call stack remains intact.
Since it is based on method overriding, so you need to have discipline in team to make sure that fake class is only used for testing. 

Instance delegate
Used to test static function class. Create normal function that will delegate call to static function and during test create another class that will override new function that was created to add fake call .

Singleton make life very difficult from testing perspective, way to make it testable is allow injection of new singleton implementation and use that implementation for testing. 

Create interface to break big class
Although adding new method/class is much simpler but still lot of code is added to existing class/method and over period of time it becomes big i mean really big.

Approach to simplify class
  • Creation of interface without any method 
  • Class that you want to simplify should implements new interface
  • All the function that was using class should now use new interface and compiler will gives you errors about missing method and you can start moving methods to interface to fix the issue.

Main advantage for this approach is that you don't have prerequisite of unit test , you can take benefit of compiler/IDE feature. 

Method & variable dependency graph
 When class grows overtime and it does not adhere to Single responsibility principle, working out what functions goes where could be tedious.
You can draw dependency graph of variable & method to workout how things are related. This can help to come up with new classes will will adhere to single responsibility principle.

Identifying seam plays important role in working out hidden dependency. 
Definition of seam from dictionary is 
A line of junction formed by sewing together two pieces of material along their margins.

Definition in context of re-factoring is - part of code will enable testing.

for e.g functions performs some I/O operation and then some calculation, so if you want to test calculation without doing any major changes to core logic then you can "Extract & Override" approach to solve this.

Scratch Refactoring
I am sure you might have seen one big class/methods with 100s or 1000s of lines and you have to scratch your head to understand what this does.
To make things more interesting that part of code is very critical to company and you have to do some changes.
Scratch Refactoring is very useful for such type of situation

  • Take monster code that you want to simplify/understand
  • This type refactoring is only focused on understanding code, so it should be done in palin text editor so that you are not worried about compilation error that IDE generates.
  • Break big method in small and single concern method, simplify condition , delete some unused code

Main benefit of this approach is that you don't have to commit this in Svn/Git only purpose of this work is understand as much as possible by creating small code blocks.

While you are doing this you will get better understanding and code is simplified to extent that doing real re-factoring will be not be that difficult.

I must say it was very useful session, lot of learning and fresh perspective.
I have got copy of Working Effectively Legacy book and will read & practice it to get more better understanding.

Wednesday, 17 September 2014

TDD - Roman Numerals kata -

Interesting experiment on TDD, i knew it is better but did't expect that proving it could be this simple .

Reference Blog :

Wednesday, 6 August 2014

Compact String List

Whenever application memory profile is analyzed string is one of the most common object that comes right on top .

Java has String pool that solves problem to some extent and lot of interesting optimization has been done in string pool for JDK 6/7/8

Whatever is given by string pool can be easily implemented by WeakHashmap or Concurrent hash map but JVM implementation is very good,so no point in reinventing it. One of the overhead associated with string object is header of char array, each array has basic header cost and extra 4 byte for size of array

On 32 bit for char array - 8(header) + 4(length) = 12
On 64 bit for char array - 12(header) + 4(length) = 16

For each string value 12 to 16 bits is wasted.
Quick memory optimization that can be done is to allocate one big array and store values of multiple string in that array.

Just to visualized how it will look

With above approach we save array header cost but another overhead is introduced that we need another sets of variable to know which part of array belong to value1 or value 2.
Int array can be used to maintain index of different value in big char array,so we save 12 bits per string and that is significant saving when you have lots of string.

In this blog i will share experiment with such approach.
Lets get into code

First thing is storage of multiple string values in single char array.

Pretty straight forward code two array is required one char[] and one int[]
Add function will expand char & int array if required and add values to it.

Iteration over element is another tricky thing that needs to be handled for such compact structure, trove style foreach looks good fit for this.

Iteration code looks like


Memory Usage
Compact list trades off add speed for memory/search, lets have look at memory gain.
In this test text from ALICE'S ADVENTURES IN WONDERLAND url is split by space and it is added to ArrayList<String> and CompactStringList.

"Alice Adventure" book has 32.5K words.

For memory test i used Heinz Kabutz Determining Memory Usage in Java approach and it gave me consistent output so i stick with it for this test.

ArrayList takes around 1755 KB, CompactList takes around 355 KB.
So for this particular example CompactList takes around 80% less memory, gain is very significant.

Detail memory usage
Lets have look at detail memory usage. I used jmap to get top contributor for this test.

This gives better understanding of gain.
Char[] in compactlist takes around 60% less memory and String object is like almost nothing with minor overhead for int[].

So it seems good trade off for memory!

What next's
- One usage is building string pool using compactlist
- CompactList is append only structure any changes done for existing element like delete/update will require rebuilding CompactList
 - Iteration using traditional style will result in garbage creation because it has to build CharSequence, but that can be overcome by using foreach approach that gives access to chars of element.

Code is available @ github

Saturday, 19 July 2014

Saturday, 28 June 2014

MethodHandle returns back in java 8

Longtime back i wrote blog to compare performance of reflection/methodhandle etc.
Methodhandle was introduced in JDK7 and its performance was not that good as compared to reflection.

Recap of test result from JDK 7

MethodHandle is right in bottom as a result it was was not of much use in JDK7.
MethodHandle is used in java 8 lambda and i think due to that lot of improvement was done in it.

Lets have look at JDK8 numbers

MethodHandle implementation of JDK8 is back.
Open JDK mailing conversation has some details about type of optimization that is required in method handle. 

Code used for this blog is available @ github

Update -
One of the reader asked for raw numbers, numbers are in Mili Second

Friday, 14 March 2014

Off Heap concurrent counter

Concurrent counter are part of almost every system, it is used to collect data, thread synchronization etc. 
Java has good support of heap based counter.

There could be use case when you need counter that can be shared between processor.

How to build inter process counters 
 - Database 
This is first option that comes to mind, database sequence is counter that can be used by multiple process. All concurrency is handled by database. It is good option for starter but we know types of overhead(Network,Locks etc) you get with database. Only Larry Elision will be happy about it, not you!
 - Some Server
You could develop some server/middleware that provides such type of service. This option will still have network latency,marshal/unmarshal overhead.

 - Memory Mapped file
You could use memory mapped file to do this. I got idea from looking at thread-safe-interprocess-shared-memory-in-java presentation from PeterLawrey.

Challenges involved in multi process counter.
- Data visibility 
    Changes done by one process should be visible to all the process. This problem can be solved by using memory mapped file. Operating System gives guarantee about it and java memory model supports it to make is possible. 

- Thread safety 
Counters is all about multiple writers , so thread safety becomes big problem. Compare-and-swap is one option to handles multiple writers.
Is it possible to use CAS for off heap operation ? yes it is possible to do that , welcome to Unsafe.
By using using Memorymapped & Unsafe it is possible to use CAS for Off heap operation.

In this blog i will share my experiment of Memory mapped using CAS.

How ?
- How to get memory address
MappedByteBuffer is using DirectByteBuffer, which is off heap memory. So it is possible to get virtual address of memory and use unsafe to perform CAS operation. Lets look at the code.

Above code create memory mapped file of 8 bytes and get the virtual address. This address can be be used to read/write content of memory mapped file.

- How to Write/read in thread safe manner

Important function to look are readVolatile and increment
readVolatile reads directly from memory and increment is using unsafe to perform CAS on the address obtained from MemoryByteBuffer.

Some performance numbers from my system. Each thread increments counter 1 Million times.

Performance of counter is decent , as number of threads are increased CAS failures starts to happen and performance starts to degrade.
Performance of these counter can be improved by having multiple segment to reduce write contention.
I will write about it in next blog.

 - Memory mapped file is very powerful, it can be used to developed lot of things like off heap collections, IPC, Off heap thread coordination etc.
  - Memory mapped file opens gates for GC less programming.

All the code used in this blog is available on github.

Monday, 17 February 2014

AtomicInteger Java 7 vs Java 8

Atomic Integer is interesting class, it is used for building many lock free algorithm. Infact JDK locks are also build using ideas from Atomic datatypes.

As name suggest it is used for doing atomic increment/decremented, so you don't have to use locks, it will use processor level instruction to do so.
It is based on Compare-and-swap instruction.

Issue with CAS
CAS works on optimistic approach, it expects some failure, so it will retry operation, so in theory if there is no contention then it should work pretty fast.

There is another alternate way of doing same thing using Fetch-and-add.
Fetch-and-add is very different from CAS, it is not based on re-try loops.

Dave Dice compares CAS vs Fetch-and-add in atomic_fetch_and_add_vs blog, i can't explain better than this, so i will copy content from his blog

  1. CAS is "optimistic" and admits failure, whereas XADD does not. With XADD there's no explicit window of vulnerability to remote interference, and thus no need for a retry loop. Arguably, XADD has better progress properties, assuming the underlying XADD implementation doesn't have an implicit loop, but even in that case the window would be narrower than with Load;Φ;CAS.
  2. Lets say you were trying to increment a variable with the usual Load;INC;CAS loop. When the CAS starts failing with sufficient frequency you can find that the branch to exit the loop (normally taken under no or light contention) starts to predict toward the failure path. So when the CAS ultimately succeeds, you'll incur a branch mispredict, which can be quite painful on processors with deep pipelines and lots of out-of-order speculative machinery. Typically, this is in a piece of code where you don't want a long stall. There's no loop and no such issues with XADD.
Since fetch-and-add has predictable progress properties, so it is used for developing waiting free algorithms.
Unfortunately JDK 7 does not have support for fetch-and-add, one more reason why C++ people will be happy that C++ is great!

As we all know things do change & java community decided to added support for fetch-and-add in JDK8, one more good reason to migrate to JDK8.

In this blog i will compare performance of AtomicInteger from JDK7 & 8

Atomic Integer - JDK 7 vs JDK 8

In this test i increment counter 10 Million times with different number of threads. Thread numbers are increased to see how counter performs under contention.

X Axis - No of Threads
Y Axis - Ops/Second - Higher is better

JDK8 counter is winner in this case, best performance is when there is no contention , for JDK7 it is 80 MOPS but for JDK8 it close to 130 MOPS.
For single thread difference is not much , JDK8 is around 0.5 times faster but as contention increases performance JDK7 counter starts falling.

I will put another graph by removing 1 thread number, so that we can clearly see how these counter performs.

This gives better idea of how slow JDK7 atomic integer is, for 8 threads JDK8 counter is around 3.5X times faster.

Dive Into Code

JDK 8 - AtomicInteger
public final int getAndIncrement() {
        return unsafe.getAndAddInt(this, valueOffset, 1);

JDK7 - AtomicInteger
 public final int getAndIncrement() {
        for (;;) {
            int current = get();
            int next = current + 1;
            if (compareAndSet(current, next))
                return current;

JDK8 is using new function(getAndAddInt) from unsafe to do the magic. Unsafe has become more useful!

Dive in Assembly
To just confirm that all performance again is coming from fetch-and-add i had look at assembly generated.

JDK 8 
0x0000000002cf49c7: mov    %rbp,0x10(%rsp)
  0x0000000002cf49cc: mov    $0x1,%eax
  0x0000000002cf49d1: lock xadd %eax,0xc(%rdx)  ;*invokevirtual getAndAddInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@8 (line 186)


0x0000000002c207f5: lock cmpxchg %r8d,0xc(%rdx)
  0x0000000002c207fb: sete   %r11b
  0x0000000002c207ff: movzbl %r11b,%r11d        ;*invokevirtual compareAndSwapInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@12 (line 206)

Introduction of fetch-and-add type of feature in java will make it more suitable for high performance computing, we will see more wait free algorithm in java

Code used for testing is available @ AtomicCounterTest
Just compile for jdk7/8 and execute it.

Thursday, 16 January 2014

Java Queues - Bad Practices

Writing after long gap, looks like i lost the motivation or ran out of topic :-)

Recently while going through code of my current assignment, i noticed that java blocking/concurrent queue is used in worst possible way and that motivated me list down some of the bad practices to access queues.

 - Ignoring return value of "offer" function.
Blocking queues has various function that can be used to insert(add/offer/put) and offer is the special one
Details from java doc about offer

Inserts the specified element into this queue if it is possible to do so immediately without violating capacity restrictions, returning true upon success and false if no space is currently available. When using a capacity-restricted queue, this method is generally preferable to add(E), which can fail to insert an element only by throwing an exception.

Return value is very important for offer function. In many case returned value is ignored and it becomes mystery that why some message are not added and this makes application magical that some time it works.

Offer function is really useful function & java Executors framework is using for very good reason. It is used for rejecting task when task submission rate is higher than task processing, below is the code snippet from

So when you are using offer function never ignore return value. If you just want to throw some exception when ever queue is full then use "add" function, it has built in support to throws exception for capacity violation.

- Using IsEmpty/peek in spin loop
Blocking queue has lock based waiting strategy , it is used for blocking operation put/take , but it also has some function that are not blocking and if those are used in loop then it will cause heavy contention.

I found example where consumer was highly optimized , it was using isEmpty in loop to check whether some element is available or not and incase when there is an element then it will call take to retrieve element.

I don't know what developer was thinking , may be he was in very deep thought when such type of code was written.
Such consumer code caused dead lock, producer could not add element because it is unable to get lock and most of the time lock was acquired by isEmpty function.

For such type of code thread dump is also interesting, you will see producer blocked but no trace of consumer thread. Multiple thread dump are required to confirm this.

Blocking queue should be used in why it is designed to , in case if you use non blocking calls then you must have some backoff strategy to avoid dead lock.

- Single consumer calling take in loop

This is very common pattern, message are taken from queue one by one, since blocking queue is lock based it needs lock for each operation.
It is always better to get multiple items from queue in one call by using drainTo type of function, this will reduce contention will give significant performance gain.

- size function for heart beat
This is also very common case where size is use to print heartbeat message. Size is not cheap operation for blocking queue, it needs lock to get size.
Atomic variable can be used to keep track of queue size and read to atomic variable will not have any effect on queue performance.

- Using sleep waiting strategy with ConcurrentLinkedQueue
Code using sleep waiting strategy

I am sure you will have very good reason to use CLQ, but sleep spoils every thing because CLQ does't have blocking support , so many application will use sleep to avoid spin loop.

Sleep based approach fine in some scenario, but if you really want to build event based system then sleep does't help,you need some kind of notification mechanism, so when ever element is added to queue it will notify consumer about that.
It is pretty easy to add such support to CLQ. I created blocking concurrent queue to try few things,  executor-with-concurrentlinkedqueue blog has some details about it.

Each type of queue is designed for very specific use case and it should be used in proper way.
Some of the bad practice mention in blog makes application slow/unresponsive, bad usage pattern should be avoided.