Thursday, 16 July 2015

BitMap sorting

Data structure is most important building block of any algorithm and it makes algorithm efficient.

Bitmap is very powerful data structure and is good fit for many problems, it is used for indexing in many database.

BitMap can be also used for sorting and for some specific case bitmap sorting can out perform may famous sorting algorithm with very low memory footprint.

Take a case that you have to sort input file that has lots of number between 1 to 10 million by and using low memory footprint and for simplicity sake assume that their are no duplicates.

What are the ways to solve this problem ?

- Classic MergeSort
This is the first solution that comes to mind whenever sorting is discussed, this will produced many intermediate files and then those will be merged.

- Muti Read & Sort
In this approach input file is read multiple times for range of data, for eg first pass will read values starting from 0 to 250K and then 250K to 500K and so on.

This solution will take less memory but too much of I/O will make it very slow .

- Amazing BitSort
Bit Sorting is really good fit for this problem, allocate bit array and turn on the bits at value index and that's it, values will get sorted as it is added to this data structure .

For eg  
{1,2,5,6,7,11} value can be represented as below

Implementation of such type of data structure is simple in C/C++ but for java bitwise manipulation is required, so to keep it simple i have implemented it using boolean array.

Code snippet 


  
It is very simple to implement this powerful data structure and little trick is required for iteration over value.

Code snippet for iteration


Conclusion
Very useful data structure with low memory foot print & very fast sorting, but it has some trade off  and you should be aware of it

- If numbers of items are less then this might not give any benefit.
- If input values are sparse then lot of space is left unused because of fragmentation.

Code is available @ github

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete