Monday, 17 February 2014

AtomicInteger Java 7 vs Java 8

Atomic Integer is interesting class, it is used for building many lock free algorithm. Infact JDK locks are also build using ideas from Atomic datatypes.

As name suggest it is used for doing atomic increment/decremented, so you don't have to use locks, it will use processor level instruction to do so.
It is based on Compare-and-swap instruction.

Issue with CAS
CAS works on optimistic approach, it expects some failure, so it will retry operation, so in theory if there is no contention then it should work pretty fast.

There is another alternate way of doing same thing using Fetch-and-add.
Fetch-and-add is very different from CAS, it is not based on re-try loops.

Dave Dice compares CAS vs Fetch-and-add in atomic_fetch_and_add_vs blog, i can't explain better than this, so i will copy content from his blog


  1. CAS is "optimistic" and admits failure, whereas XADD does not. With XADD there's no explicit window of vulnerability to remote interference, and thus no need for a retry loop. Arguably, XADD has better progress properties, assuming the underlying XADD implementation doesn't have an implicit loop, but even in that case the window would be narrower than with Load;Φ;CAS.
  2. Lets say you were trying to increment a variable with the usual Load;INC;CAS loop. When the CAS starts failing with sufficient frequency you can find that the branch to exit the loop (normally taken under no or light contention) starts to predict toward the failure path. So when the CAS ultimately succeeds, you'll incur a branch mispredict, which can be quite painful on processors with deep pipelines and lots of out-of-order speculative machinery. Typically, this is in a piece of code where you don't want a long stall. There's no loop and no such issues with XADD.
Since fetch-and-add has predictable progress properties, so it is used for developing waiting free algorithms.
Unfortunately JDK 7 does not have support for fetch-and-add, one more reason why C++ people will be happy that C++ is great!


As we all know things do change & java community decided to added support for fetch-and-add in JDK8, one more good reason to migrate to JDK8.

In this blog i will compare performance of AtomicInteger from JDK7 & 8

Atomic Integer - JDK 7 vs JDK 8

In this test i increment counter 10 Million times with different number of threads. Thread numbers are increased to see how counter performs under contention.



X Axis - No of Threads
Y Axis - Ops/Second - Higher is better

JDK8 counter is winner in this case, best performance is when there is no contention , for JDK7 it is 80 MOPS but for JDK8 it close to 130 MOPS.
For single thread difference is not much , JDK8 is around 0.5 times faster but as contention increases performance JDK7 counter starts falling.

I will put another graph by removing 1 thread number, so that we can clearly see how these counter performs.


This gives better idea of how slow JDK7 atomic integer is, for 8 threads JDK8 counter is around 3.5X times faster.

Dive Into Code

JDK 8 - AtomicInteger
public final int getAndIncrement() {
        return unsafe.getAndAddInt(this, valueOffset, 1);
    }

JDK7 - AtomicInteger
 public final int getAndIncrement() {
        for (;;) {
            int current = get();
            int next = current + 1;
            if (compareAndSet(current, next))
                return current;
        }
    }

JDK8 is using new function(getAndAddInt) from unsafe to do the magic. Unsafe has become more useful!

Dive in Assembly
To just confirm that all performance again is coming from fetch-and-add i had look at assembly generated.

JDK 8 
0x0000000002cf49c7: mov    %rbp,0x10(%rsp)
  0x0000000002cf49cc: mov    $0x1,%eax
  0x0000000002cf49d1: lock xadd %eax,0xc(%rdx)  ;*invokevirtual getAndAddInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@8 (line 186)


JDK 7

0x0000000002c207f5: lock cmpxchg %r8d,0xc(%rdx)
  0x0000000002c207fb: sete   %r11b
  0x0000000002c207ff: movzbl %r11b,%r11d        ;*invokevirtual compareAndSwapInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@12 (line 206)

  
Conclusion
Introduction of fetch-and-add type of feature in java will make it more suitable for high performance computing, we will see more wait free algorithm in java

Code used for testing is available @ AtomicCounterTest
Just compile for jdk7/8 and execute it.

8 comments:

  1. All your blogs are interesting and informative. Attaching code snippets is a great add on.

    ReplyDelete
  2. Good to hear that you find is useful.
    I do add link to GitHub that has code, I avoid adding code snippets to keep blog small, but try to include some in future post.

    ReplyDelete
  3. It was very useful for me. Keep sharing such ideas in the future as well. Website Design Company Bangalore | Web Designing Bangalore

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. According to wiki https://en.wikipedia.org/wiki/Fetch-and-add a consensus number is equal to 2. My understanding of that fact is that only up to 2 threads can act on a shared AtomicInteger without some waiting/congestion.
    How would you explain significant drop in performance of fetch-and-add between 1 and 2 threads (and not between 2 and 3).
    Similarly, why fetch-and-add performance degradation while comparing 2 threads with 3 threads is of same magnitude as when comparing 3 and 4 threads?

    ReplyDelete