Showing posts with label Merge Sort. Show all posts
Showing posts with label Merge Sort. Show all posts

Sunday, 19 June 2016

Java Arrays Sort decoded


Sorting is one the first algorithm that we learn in computer science. Sorting is such an interesting area that it has around 20+ algorithm and it is always difficult to decided which one is best.

Sorting algorithm efficiency is measured in terms of time taken & space required.

Some time bubble sort is best because it has no space requirement and for device where space is constraint or random access of element is not possible then it can be good fit .

These days we tend to use library sort functions, most of language library sort function is adaptive and it uses best algorithm depending on size of data.

In the blog i will share how those decision are made in java Arrays.sort function.

Decision are based on Data Type and Size

 - Byte 
For byte java API decides between Counting Sort or Insertion Sort.

If size of input array is less than 29 then insertion sort is used, visualization of insertion sort
For large arrays Counting sort is used and it is based on fact that range of byte is -128 to 128 and it used as advantage to do quick sort.

Counting sort has little memory requirement and insertion is in place, so over all not much allocation is done and it will keep garbage collector happy when byte array is sorted.

- Char
For char decision is between Counting Sort & Dual Pivot QuickSort

If size of input is greater than 3.2K then counting sort it used and it allocates array of 65K size to implement sorting.

For smaller array Quick Sort variant using Dual pivot is used , visualization of quick sort.


QuickSort used is also in-place , so memory wise not much load on Garbage Collector.


- Integer/Long

For integer/long things gets interesting as Merge Sort makes entry.

For input less than 256 , two options are available
 - If input is less than 47 then Insertion Sort and for other case Dual Pivot QuickSort is used.

For large input array there are some good edge case checks
 - If arrays is already sorted either Ascending or Descending then it is single loop to check that.

 - If element of array are same then Quick Sort is used as it works best in such case.

 - Or if element are really jumbled like for e.g like each even element is greater then odd element then it will use Quick Sort.

and at last all this checks fails then Merge Sort is used and new array of same size is allocated and sorting is performed. Quick refresher of Merge Sort



Important thing to note about Integer sort is that if Array is already sorted then no memory is allocated and if QuickSort kicks in memory allocation is under check.


- Float/Double

Float has special optimization for NAN, all the NAN are moved to end of array and they are skipped from sorting.

Once NAN values are handled then sorting goes through same checks as INTEGER data type.

- Sorting on Objects

Collection sorting has little different rules, for collections it is only between Merge Sort & Timsort.
By default Timsort is used which is mix of merge & insertion sort .

Merge sort is more over decommissioned and it is only used when "java.util.Arrays.useLegacyMergeSort" flag is switched on.


In JDK 8 parallel sort options are also added which are based on input size of array, for arrays with size greater than 8K then parallel version of sort is used.

Resources
Declare Array Java Example
How to create an array in java


Thursday, 16 July 2015

BitMap sorting

Data structure is most important building block of any algorithm and it makes algorithm efficient.

Bitmap is very powerful data structure and is good fit for many problems, it is used for indexing in many database.

BitMap can be also used for sorting and for some specific case bitmap sorting can out perform may famous sorting algorithm with very low memory footprint.

Take a case that you have to sort input file that has lots of number between 1 to 10 million by and using low memory footprint and for simplicity sake assume that their are no duplicates.

What are the ways to solve this problem ?

- Classic MergeSort
This is the first solution that comes to mind whenever sorting is discussed, this will produced many intermediate files and then those will be merged.

- Muti Read & Sort
In this approach input file is read multiple times for range of data, for eg first pass will read values starting from 0 to 250K and then 250K to 500K and so on.

This solution will take less memory but too much of I/O will make it very slow .

- Amazing BitSort
Bit Sorting is really good fit for this problem, allocate bit array and turn on the bits at value index and that's it, values will get sorted as it is added to this data structure .

For eg  
{1,2,5,6,7,11} value can be represented as below

Implementation of such type of data structure is simple in C/C++ but for java bitwise manipulation is required, so to keep it simple i have implemented it using boolean array.

Code snippet 


  
It is very simple to implement this powerful data structure and little trick is required for iteration over value.

Code snippet for iteration


Conclusion
Very useful data structure with low memory foot print & very fast sorting, but it has some trade off  and you should be aware of it

- If numbers of items are less then this might not give any benefit.
- If input values are sparse then lot of space is left unused because of fragmentation.

Code is available @ github