Sunday 19 June 2016

Java Arrays Sort decoded


Sorting is one the first algorithm that we learn in computer science. Sorting is such an interesting area that it has around 20+ algorithm and it is always difficult to decided which one is best.

Sorting algorithm efficiency is measured in terms of time taken & space required.

Some time bubble sort is best because it has no space requirement and for device where space is constraint or random access of element is not possible then it can be good fit .

These days we tend to use library sort functions, most of language library sort function is adaptive and it uses best algorithm depending on size of data.

In the blog i will share how those decision are made in java Arrays.sort function.

Decision are based on Data Type and Size

 - Byte 
For byte java API decides between Counting Sort or Insertion Sort.

If size of input array is less than 29 then insertion sort is used, visualization of insertion sort
For large arrays Counting sort is used and it is based on fact that range of byte is -128 to 128 and it used as advantage to do quick sort.

Counting sort has little memory requirement and insertion is in place, so over all not much allocation is done and it will keep garbage collector happy when byte array is sorted.

- Char
For char decision is between Counting Sort & Dual Pivot QuickSort

If size of input is greater than 3.2K then counting sort it used and it allocates array of 65K size to implement sorting.

For smaller array Quick Sort variant using Dual pivot is used , visualization of quick sort.


QuickSort used is also in-place , so memory wise not much load on Garbage Collector.


- Integer/Long

For integer/long things gets interesting as Merge Sort makes entry.

For input less than 256 , two options are available
 - If input is less than 47 then Insertion Sort and for other case Dual Pivot QuickSort is used.

For large input array there are some good edge case checks
 - If arrays is already sorted either Ascending or Descending then it is single loop to check that.

 - If element of array are same then Quick Sort is used as it works best in such case.

 - Or if element are really jumbled like for e.g like each even element is greater then odd element then it will use Quick Sort.

and at last all this checks fails then Merge Sort is used and new array of same size is allocated and sorting is performed. Quick refresher of Merge Sort



Important thing to note about Integer sort is that if Array is already sorted then no memory is allocated and if QuickSort kicks in memory allocation is under check.


- Float/Double

Float has special optimization for NAN, all the NAN are moved to end of array and they are skipped from sorting.

Once NAN values are handled then sorting goes through same checks as INTEGER data type.

- Sorting on Objects

Collection sorting has little different rules, for collections it is only between Merge Sort & Timsort.
By default Timsort is used which is mix of merge & insertion sort .

Merge sort is more over decommissioned and it is only used when "java.util.Arrays.useLegacyMergeSort" flag is switched on.


In JDK 8 parallel sort options are also added which are based on input size of array, for arrays with size greater than 8K then parallel version of sort is used.

Resources
Declare Array Java Example
How to create an array in java