Showing posts with label JDk8. Show all posts
Showing posts with label JDk8. Show all posts

Thursday, 1 October 2015

Lazy Array Initialization

Lazy object initialization is one of the important strategy to avoid upfront allocation and it can become little tricky if multiple threads are involved.

Some of the language like scala has language level support for lazy variable and compiler generates thread safe code for you but in java some coding is required to achieve this.

ConcurrentHashMap of JDK7 does upfront allocation of segment object and it adds to memory footprint of empty collection but in JDK8 lazy val strategy is used for initialization to save some memory for empty collection.

So what code is used to do allocation in thread safe way ? It is simple but it has some fine details that is worth looking at, below is code snippet that is used in ConcurrentHashMap


It has check on array length because that is written after memory allocation is done.

Code used in blog is available @ github

Wednesday, 30 September 2015

Wrap around design pattern in java8

Wrap around pattern is not listed in in GOF book but is very useful for problem like below

 - Loop construct for e.g do while/while/for loop
 - Stopwatch around some code.
 - Wrap checked exception with run time exception
 - Initialization and cleanup for eg Threadpool creation/destruction or file open/close etc
 - Adding context info to threads for eg request context info for logging or passing security context etc

Lot of plumbing code is required in java to do such simple things. Java8 added lamdba support and that has answer for such problems.

Using Lambda behavior can be passed as argument to any function and it is very powerful thing if you want to solve above problems.

Wrap Around
Template of wrap around function is something like below

Pre Code
Actual behavior
Post Code

WrapAround for loop



Above code is very straight forward , it has 2 functional interface one for condition & another one for block of code to execute and those 2 behavior is passed to loop function using lambda.
This allows us to introduce new construct

Lets look at some more example

WrapAround for time/stopwatch



WrapAround Closable/Exception


Java 8 has tons of feature that can make code concise and i used just one the the feature implement really useful thing.

Code used in blog is available @ github

Wednesday, 14 January 2015

Java8 Parallel Streams


Recently found great article on when to use Parallel Stream by Doug lea and team. In this blog i will share some performance result of different data structure.

Splittable data structure
This is one of the most important factor that needs to be consider. All the parallel stream operation are executed using default fork join pool and fork join pool uses Divide and conquer algorithms to split stream in small chunks and apply function on small chunks, so splitting is important factor and if splitting is going to take more time then all computation is going to choke!

Types of datastructure

-Array
Array based data structure are most efficient data structure from splitting perspective because each element can be randomly accessed , so splitting is very quick operation.
Some of the example of array based collections are ArrayList & open addressing based Set/Map.

-Tree
Balanced tree based collection like TreeMap or ConcurrentHashMap. It is easy to chop the collection into 2 parts, but if tree is not balanced then splitting will be not that efficient because task load will be not equal for each thread.
Another thing to consider is that memory access pattern for tree based data structure is random , so there will be memory latency cost when elements are accessed.

-LinkedList
This type of data structure gives worst splitting performance because each element must be visited to split it. Some of the samples from JDK are LinkedList,Queues

- Others
This for all others type of datastructure for eg I/O based , BufferedReader.lines which returns stream but splitting operation is very expensive.

Some performance numbers
All performance tuning guess must be backed by experiment, so lets have look at some numbers.

Some code snippet
Array Stream
ArrayList Stream
Set Stream
LinkedList Stream


Measurement Code


Result 

Array 31 sec
ArrayList 31 sec
set 52 sec
linkedlist 106 sec

Result confirms the guess that Array based collections are fastest, so choose right datastructure before you start using parallel streams.

Code used in blog is available @ github

Thursday, 25 December 2014

what is new in java8 ConcurrentHashMap

Java8 has lots of new features that will help in writing code that express its intent more clearly and in concise way.
ConcurrentHashMap has lots of nice features that reduces boiler plate code, most of the new features are parallel, so you don't have to worry about locks/threads etc when using CHM.

Smart Cache
One of the most common usage of CHM is to build cache and it has little concurrency issue that value for key should be computed once and it should provide all the "happens after" guarantee.

Pre JDK8 you have to do some thing like 


JDK8 has special function computeIfAbsent for such type of thing



Difference is obvious for prejdk8 future task is required and some other special handling to make sure it works fine, but JDK8 just call to computeIfAbsent with lamdba expression and your are done.
There is computeIfPresent function also to remap/refresh value in cache.

Merge values 
One of the most common problem with maps is when values of same keys needs to merged, one of the example is you have map of word & count.
Every time word is added to map and if it exists then count needs to be incremented

Example code pre JDK8

Example code in JDK8

All the noise that was there in JDK7 code is removed and code express its intent so nicely

 Aggregate/Reduce values

This is also very common use case where single value is derived from values in map for eg total number of words

sample code for pre JDK8

sample code for JDK8



Code is so so much clean, extra thing to note in jdk8 reduce function is first parameter which is the (estimated) number of elements needed for this operation to be executed in parallel.
So this function supports parallel reducing

Iteration
All the collection of JDK8 has done improvement on iteration method, so application code will never have any loop, it is all controlled by underlying collection and due to which iteration can be done in parallel .

CHM has 2 types of foreach

others interesting functions are search

One thing to note is that all the parallel function are executed on common forkjoin pool, there is way by which you can provide your own pool to execute task.

Monday, 17 February 2014

AtomicInteger Java 7 vs Java 8

Atomic Integer is interesting class, it is used for building many lock free algorithm. Infact JDK locks are also build using ideas from Atomic datatypes.

As name suggest it is used for doing atomic increment/decremented, so you don't have to use locks, it will use processor level instruction to do so.
It is based on Compare-and-swap instruction.

Issue with CAS
CAS works on optimistic approach, it expects some failure, so it will retry operation, so in theory if there is no contention then it should work pretty fast.

There is another alternate way of doing same thing using Fetch-and-add.
Fetch-and-add is very different from CAS, it is not based on re-try loops.

Dave Dice compares CAS vs Fetch-and-add in atomic_fetch_and_add_vs blog, i can't explain better than this, so i will copy content from his blog


  1. CAS is "optimistic" and admits failure, whereas XADD does not. With XADD there's no explicit window of vulnerability to remote interference, and thus no need for a retry loop. Arguably, XADD has better progress properties, assuming the underlying XADD implementation doesn't have an implicit loop, but even in that case the window would be narrower than with Load;Φ;CAS.
  2. Lets say you were trying to increment a variable with the usual Load;INC;CAS loop. When the CAS starts failing with sufficient frequency you can find that the branch to exit the loop (normally taken under no or light contention) starts to predict toward the failure path. So when the CAS ultimately succeeds, you'll incur a branch mispredict, which can be quite painful on processors with deep pipelines and lots of out-of-order speculative machinery. Typically, this is in a piece of code where you don't want a long stall. There's no loop and no such issues with XADD.
Since fetch-and-add has predictable progress properties, so it is used for developing waiting free algorithms.
Unfortunately JDK 7 does not have support for fetch-and-add, one more reason why C++ people will be happy that C++ is great!


As we all know things do change & java community decided to added support for fetch-and-add in JDK8, one more good reason to migrate to JDK8.

In this blog i will compare performance of AtomicInteger from JDK7 & 8

Atomic Integer - JDK 7 vs JDK 8

In this test i increment counter 10 Million times with different number of threads. Thread numbers are increased to see how counter performs under contention.



X Axis - No of Threads
Y Axis - Ops/Second - Higher is better

JDK8 counter is winner in this case, best performance is when there is no contention , for JDK7 it is 80 MOPS but for JDK8 it close to 130 MOPS.
For single thread difference is not much , JDK8 is around 0.5 times faster but as contention increases performance JDK7 counter starts falling.

I will put another graph by removing 1 thread number, so that we can clearly see how these counter performs.


This gives better idea of how slow JDK7 atomic integer is, for 8 threads JDK8 counter is around 3.5X times faster.

Dive Into Code

JDK 8 - AtomicInteger
public final int getAndIncrement() {
        return unsafe.getAndAddInt(this, valueOffset, 1);
    }

JDK7 - AtomicInteger
 public final int getAndIncrement() {
        for (;;) {
            int current = get();
            int next = current + 1;
            if (compareAndSet(current, next))
                return current;
        }
    }

JDK8 is using new function(getAndAddInt) from unsafe to do the magic. Unsafe has become more useful!

Dive in Assembly
To just confirm that all performance again is coming from fetch-and-add i had look at assembly generated.

JDK 8 
0x0000000002cf49c7: mov    %rbp,0x10(%rsp)
  0x0000000002cf49cc: mov    $0x1,%eax
  0x0000000002cf49d1: lock xadd %eax,0xc(%rdx)  ;*invokevirtual getAndAddInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@8 (line 186)


JDK 7

0x0000000002c207f5: lock cmpxchg %r8d,0xc(%rdx)
  0x0000000002c207fb: sete   %r11b
  0x0000000002c207ff: movzbl %r11b,%r11d        ;*invokevirtual compareAndSwapInt
                                                ; - java.util.concurrent.atomic.AtomicInteger::compareAndSet@9 (line 135)
                                                ; - java.util.concurrent.atomic.AtomicInteger::incrementAndGet@12 (line 206)

  
Conclusion
Introduction of fetch-and-add type of feature in java will make it more suitable for high performance computing, we will see more wait free algorithm in java

Code used for testing is available @ AtomicCounterTest
Just compile for jdk7/8 and execute it.

 Integer Java Class Example