Thursday 25 July 2013

Which memory is faster Heap or ByteBuffer or Direct ?

Java is becoming new C/C++ , it is extensively used in developing High Performance System.
Good for millions of Java developer like me:-)

In this blog i will share my experiment with different types of memory allocation that can be done in java and what type of benefit you get with that.

Memory Allocation In Java
What type of support Java provide for memory allocation

 - Heap Memory
I don't i have to explain this, all java application starts with this.  All object allocated using "new" keyword goes under Heap Memory

- Non Direct ByteBuffer
It is wrapper over byte array, just flavor of Heap Memory.
ByteBuffer.allocate() can be used to create this type of object, very useful if you want to deal in terms of bytes not Object.

 - Direct ByteBuffer
This is the real stuff that java added since JDK 1.4.
Description of Direct ByteBuffer based on Java Doc

"A direct byte buffer may be created by invoking the allocateDirect factory method of this class. The buffers returned by this method typically have somewhat higher allocation and deallocation costs than non-direct buffers. The contents of direct buffers may reside outside of the normal garbage-collected heap, and so their impact upon the memory footprint of an application might not be obvious. It is therefore recommended that direct buffers be allocated primarily for large, long-lived buffers that are subject to the underlying system's native I/O operations. In general it is best to allocate direct buffers only when they yield a measureable gain in program performance."

Important thing to note about Direct Buffer is 
 - It is Outside of JVM
 - Free from Garbage Collector reach.

These are very important thing if you care about performance.
MemoryMapped file are also flavor of Direct byte buffer, i shared some of my finding with that in below blogs


- Off Heap or Direct Memory
This is almost same as Direct ByteBuffer but with little different, it can be allocated by unsafe.allocateMemory, as it is direct memory so it creates no GC overhead. Such type of memory must be manually released.

In theory Java programmer are not allowed to do such allocation and i think reason could be
 - It is complex to manipulate such type of memory because you are only dealing with bytes not object
 - C/C++ community will not like it :-)

Lets take deep dive into memory allocation

For memory allocation test i will use 13 byte of message & it is broken down into
 - int - 4 byte
 - long - 8 byte
 - byte - 1 byte

I will only test write/read performance, i am not testing memory consumption/allocation speed.

Write Performance

X Axis - No Of Reading
Y Axis - Op/Second in Millions

5 Million 13 bytes object are written using 4 types of allocation.
Direct ByteBuffer & Off Heap are best in this case, throughput is close to 350 Million/Sec
Normal ByteBuffer is very slow, TP is just 85 Million/Sec
Direct/Off Heap is around 1.5X times faster than heap

I did same test with 50 Million object to check how does it scale, below is graph for same.

X Axis - No Of Reading
Y Axis - Op/Second in Millions

Numbers are almost same as 5 Million.

Read Performance

Lets look at read performance

X Axis - No Of Reading
Y Axis - Op/Second in Millions

This number is interesting, OFF heap is blazing fast throughput for 12,000 Millions/Sec :-)
Only close one is HEAP read which is around 6X times slower than OFF Heap.

Look at Direct ByteBuffer , it is tanked at just 400 Million/Sec, not sure why it is so

Lets have look at number for 50 Million Object

X Axis - No Of Reading
Y Axis - Op/Second in Millions

Not much different.

Off heap via Unsafe is blazing fast with 330/11200 Million/Sec.
Performance for all other types of allocation is either good for read or write, none of the allocation is good for both.
Special note about ByteBuffer, it is pathetic , i am sure you will not use this after seeing such number.
DirectBytebuffer sucks in read speed, i am not sure why it is so slow

So if memory read/write is becoming bottle neck in your system then definitely Off-heap is the way to go, remember it is highway, so drive with care.

Code is available @ git hub

Fixing broken code link - code available at github


  1. What are you doing for GC?

    I find it very hard to believe these results for the simple reason that off heap BECOMES on heap before you can actually use it...

  2. What are you doing for GC?

    I find it very hard to believe these results for the simple reason that off heap BECOMES on heap before you can actually use it...

  3. Once you allocate objects off heap it is out of scope for gc,
    Gc doesn't care about that, it is case for allocation and read operation.
    During read if you will see little jump in transient memory but it is not big to cause any side effect.
    Best way to verify these things are use profiler , I did not see significant gc activity during off heap, but for other type of allocation their was gc impact
    You can take scientific approach to see if these numbers are correct, execute it on your test pc

  4. Hi Ashkrit, I need to transfer double precision array between java and c++.
    What way will be better in my case? Direct DoubleBuffer or my own unsafe realization?

  5. If you looking for solution that is fast then JNI is better solution, but it is more difficult one .

    Other options are
    - Use protobuffer/Avro marshal message and your c/c++ client can read these over socket. This is clean/simple solution because marshal/unmarshal is taken take by
    those framework.

    - You can use memorymapped file also, java part can write message to memory mapped and c/c++ can read from that.

    1. And how would you do that? I mean how would you map a file in Java, which could be then read from C++. Can this be done even without JNI?

  6. Hi Ashkrit, could you provide some information like OS, jdk version and son on?

  7. OS : Windows 8
    JDK : 1.7 build 45+
    Processor : i7-3632QM @ 2.20 GHz

    What type of result are you seeing on your test box ?

  8. I'm getting somewhat different numbers. (Java 1.8_40, OS X 10.10.5, Haswell i7-4771 CPU @ 3.50GHz).

    My data are something like W/R = 240/120 for HEAP, 110/70 for OFFHEAP, 120/100 for DBB (same as for MappedByteBuffer BTW), and 270/300 for BB.

  9. Seems like Heap and Byte buffer has better performance on your test machine.
    BB was worst in my test but looks like lot has been improved on jdk8.
    I will try to benchmark on jdk8 on my hardware and post result.