Are you ready: November 2012

Friday, 30 November 2012

Lock Free bounded queue

Lock free bounded queue

JDK 1.5 was first step towards adding serious concurrency support to java, it changed the way you write concurrent program and in other version 1.6, 1.7 etc more concurrency support.

One of the important building block for concurrent programming is Queue, java has lot of them ArrayBlockingQueue, ArrayDeque, ConcurrentLinkedQueue, DelayQueue, LinkedBlockingDeque, LinkedBlockingQueue, PriorityBlockingQueue, PriorityQueue, SynchronousQueue.
I am sure you will find one Queue that will fit your need!

Locks are used in most of the queue for managing concurrent access to data structure because of obvious reason of simplicity to use, lock based strategy makes it difficult to develop low latency system, it could result in priority inversion, context switching, deadlock etc
Alternate way is using Non Blocking Algorithm , which is bit difficult to develop, one of the non blocking algorithm is lock free algorithm

Java added support for look free via java.util.concurrent.atomic package, so now time for some action with lock free api.

Java 1.5+ has lock free queue ConcurrentLinkedQueue, which is very fast but it is unbounded and such type queue can be very risky to use real application because most of the time queue is not balanced, it is either full or empty and if it is unbounded then system can run out of resource.

I will implement lock free bounded queue and measure it performance against lock based bounded queue(ArrayBlocking), lock free unbounded queue(ConcurrentLinkedQueue). I chose these queue because they are fastest queue available in JDK.

Number Game

Test Envinorment Details
OS : Windows 7, 64 bit
Processor : Intel (R) Core(TM) i7-2820QM CPU @ 2.30 GHz
Processor Arch : Sandy Bridge , 4 physical cores/8 threads

For this test 10 Million message was added to queue using single producer and consumer number are in range of 1 to 5, non blocking call(offer/poll) call is used to produce/consume message from queue, performance is best when there is 1 consumer and as number of consumer increase performance degrades, so it is evident that contention is the culprit.

Cpu Usage

Cpu usage for LockFreeBoundedQueue

Cpu usage for ArrayBlockingQueue

From CPU usage it is clear that contention is causing drop in cpu usage and with lock free queue CPU usage is in acceptable range.

Test Code of LockFreeBoundedQueue

Link To Code

Above code is just an idea to demonstrate that how powerful non blocking algorithm can be. This code can be enhanced to add more feature like multiple producer, support for blocking call put/take.
Before adding blocking support(put/take) to queue we have to think about blocking strategy, there are couple of option spinning, hybrid spinning etc.

Saturday, 3 November 2012

Latency number that you should know

Many of you work on low latency & high throughput system. Key to developing such system is understanding latency be it of cpu cache, Ram, disk or network. I found some interesting latency number, understanding these number is very important because all these are based on speed of light and we all know that we will never be able to get faster than light.

L1 cache reference ......................... 0.5 ns
Branch mispredict ............................ 5 ns
L2 cache reference ........................... 7 ns
Mutex lock/unlock ........................... 25 ns
Main memory reference ...................... 100 ns             
Compress 1K bytes with Zippy ............. 3,000 ns  =   3 µs
Send 2K bytes over 1 Gbps network ....... 20,000 ns  =  20 µs
SSD random read ........................ 150,000 ns  = 150 µs
Read 1 MB sequentially from memory ..... 250,000 ns  = 250 µs
Round trip within same datacenter ...... 500,000 ns  = 0.5 ms
Read 1 MB sequentially from SSD* ..... 1,000,000 ns  =   1 ms
Disk seek ........................... 10,000,000 ns  =  10 ms
Read 1 MB sequentially from disk .... 20,000,000 ns  =  20 ms
Send packet CA->Netherlands->CA .... 150,000,000 ns  = 150 ms

CPU Cache latency cost

L1 cache is nearest to core and as you move away you can see that latency is taking hit and if you are doing it billion times then it is going to get converted to human noticeable delay, here is what it will look like.

Minute

L1 cache reference                  0.5 s         One heart beat (0.5 s)

Branch mispredict                   5 s           Yawn

L2 cache reference                  7 s           Long yawn

Mutex lock/unlock                   25 s          Taking couple of deep breath.

Hour

Main memory reference               100 s         Answering nature call

Week

Read 1 MB sequentially from memory 2.9 days Welcome long weekend

So to get better performance you have to try to come close to L1/L2 cache which is really difficult things to do. You have develop Cache oblivious algorithm to get super performance.

I/O latency cost

Read 1 MB sequentially from disk .... 20,000,000 ns = 20 ms

Read 1 MB sequentially from SSD(~1GB/sec) ..... 1,000,000 ns  =   1 ms

So for normal disk on an average can read 50 MB/Sec

For SSD 1000 MB/Sec

50 MB/Sec is pretty fast and there are many techniques by which you can increase sequential read more by adjusting buffer size of read, so before making decision on what type of disk you use, you should make sure are you able to process data at the rate it is read from normal Disk. If you can’t process as fast as that then no point in getting SSD disk.

Reference

https://gist.github.com/2841832

Friday, 2 November 2012

Power of Java MemoryMapped File

In JDK 1.4 interesting feature of Memory mapped file was added to java, which allow to map any file to OS memory for efficient reading. Memory mapped file can be used to developed IPC type of solution. This article is experiment with memory mapped file to create IPC.

Some details about Memory Mapped File, definition from WIKI

A memory-mapped file is a segment of virtual memory which has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource. This resource is typically a file that is physically present on-disk, but can also be a device, shared memory object, or other resource that the operating system can reference through a file descriptor. Once present, this correlation between the file and the memory space permits applications to treat the mapped portion as if it were primary memory.

Sample Program

There are two java program one is writer and other is reader. Writer is producer and tries to write to Memory Mapped file , reader is consumer and it reads message from memory mapped file. This is just a sample program to show to idea, it does't handle many edge case but good enough to build something on top of memory mapped file.

MemoryMapWriter

import java.io.File;

import java.io.FileNotFoundException;

import java.io.IOException;

import java.io.RandomAccessFile;

import java.nio.MappedByteBuffer;

import java.nio.channels.FileChannel;

public class MemoryMapWriter {

public static void main(String[] args) throws FileNotFoundException, IOException, InterruptedException {

File f = new File("c:/tmp/mapped.txt");

f.delete();

FileChannel fc = new RandomAccessFile(f, "rw").getChannel();

long bufferSize=8*1000;

MappedByteBuffer mem =fc.map(FileChannel.MapMode.READ_WRITE, 0, bufferSize);

int start = 0;

long counter=1;

long HUNDREDK=100000;

long startT = System.currentTimeMillis();

long noOfMessage = HUNDREDK * 10 * 10;

for(;;)

{

if(!mem.hasRemaining())

{

start+=mem.position();

mem =fc.map(FileChannel.MapMode.READ_WRITE, start, bufferSize);

}

mem.putLong(counter);

counter++;

if(counter > noOfMessage )

break;

}

long endT = System.currentTimeMillis();

long tot = endT - startT;

System.out.println(String.format("No Of Message %s , Time(ms) %s ",noOfMessage, tot)) ;

}

What is your weakness

What is your weakness

I am basically java developer and thought that first blog that i will write will be on java, but their are many more interesting thing than java in world, so here is my first blog on "What is your weakness"

I am sure you have heard this question countless time in HR interview. You will find many article explaining on how to answer such question, i don't understand most of them and find it difficult to give convincing answer to this. So i thought to put down some examples that can help in HR rounds!

Become more effective .
You can be productive but being effective is all together different ball game. I am sure you will have many real example to justify this

Learn to do better prioritization.
This is such a big topic that you can spend whole life in learning this art.

Respecting your prioritization.
If you know what is important then make sure you give it that attention.

Handle interruption.
I am sure most us suffer in personal & professional life because we don't handle interruption in better way. This is art that you will master only if get better focus on work.

Personal retrospective
Most of us don't do this, i am sure about my self, we should do this every week, this will help us in finding the direction in which we are going.

Reduce backlog.
This is very difficult to do but then you know practice makes everybody perfect, so you have to practice to reduce your personal/professional backlog

I am sure by using some of these answer you can impress HR.

Friday, 30 November 2012

Lock Free bounded queue

Saturday, 3 November 2012

Latency number that you should know

Latency number that you should know

CPU Cache latency cost

Minute

Hour

Week

I/O latency cost

Reference

Friday, 2 November 2012

Power of Java MemoryMapped File

Power of Java MemoryMapped File

Some details about Memory Mapped File, definition from WIKI

Sample Program

MemoryMapWriter

Thursday, 1 November 2012

What is your weakness