Thursday, 18 June 2015

Integer Encoding Magic

Question was asked to me on serializable/externalizable and why externalizable is fast , my answer was not convincing , so i spend some time on google to get more details and web has tons of information on it.

I thought of spending some time on open source serialization framework to check what type of optimization are done to reduce size.

One of the technique for integer type is use encoding, this technique is based on very simple rule.

Integer is of 4 Byte and if value is small then we don't have to use 4 byte to store it, so that way for small input values this technique can reduce size by great extent.

Some bitwise magic is required to achieve this.
I did quick compare of Raw vs Encoding and results were very good. In this test i write 1 Million integer values starting from 0 .

Without encoding it took 4019 KB for 1 Million and with byte encoding it took 2991 KB, encoding gives around 25% gain for this sample test.

Integer Encoding
Code snippet for integer encoding, it is taken from Apache Avro


Conclusion 
 Integer encoding technique gives good gain for size , but it adds overhead on write/read side, so it should be used by knowing the  side effect , such approach is very good for serialization framework because you want to reduce size of data that is sent over wire or written to some persistent store.

Coded used for this blog is available @ github



3 comments: