Wednesday 6 August 2014

Compact String List

Whenever application memory profile is analyzed string is one of the most common object that comes right on top .

Java has String pool that solves problem to some extent and lot of interesting optimization has been done in string pool for JDK 6/7/8

Whatever is given by string pool can be easily implemented by WeakHashmap or Concurrent hash map but JVM implementation is very good,so no point in reinventing it. One of the overhead associated with string object is header of char array, each array has basic header cost and extra 4 byte for size of array

On 32 bit for char array - 8(header) + 4(length) = 12
On 64 bit for char array - 12(header) + 4(length) = 16

For each string value 12 to 16 bits is wasted.
Quick memory optimization that can be done is to allocate one big array and store values of multiple string in that array.

Just to visualized how it will look


With above approach we save array header cost but another overhead is introduced that we need another sets of variable to know which part of array belong to value1 or value 2.
Int array can be used to maintain index of different value in big char array,so we save 12 bits per string and that is significant saving when you have lots of string.

In this blog i will share experiment with such approach.
Lets get into code

Storage
First thing is storage of multiple string values in single char array.



Pretty straight forward code two array is required one char[] and one int[]
Add function will expand char & int array if required and add values to it.

Iteration
Iteration over element is another tricky thing that needs to be handled for such compact structure, trove style foreach looks good fit for this.


Iteration code looks like

---

Memory Usage
Compact list trades off add speed for memory/search, lets have look at memory gain.
In this test text from ALICE'S ADVENTURES IN WONDERLAND url is split by space and it is added to ArrayList<String> and CompactStringList.

"Alice Adventure" book has 32.5K words.

For memory test i used Heinz Kabutz Determining Memory Usage in Java approach and it gave me consistent output so i stick with it for this test.

ArrayList takes around 1755 KB, CompactList takes around 355 KB.
So for this particular example CompactList takes around 80% less memory, gain is very significant.



Detail memory usage
Lets have look at detail memory usage. I used jmap to get top contributor for this test.

















This gives better understanding of gain.
Char[] in compactlist takes around 60% less memory and String object is like almost nothing with minor overhead for int[].

So it seems good trade off for memory!

What next's
- One usage is building string pool using compactlist
- CompactList is append only structure any changes done for existing element like delete/update will require rebuilding CompactList
 - Iteration using traditional style will result in garbage creation because it has to build CharSequence, but that can be overcome by using foreach approach that gives access to chars of element.

Code is available @ github