Thursday, 23 July 2015

Efficiency with Algorithms

Recently had look at excellent talk on Efficiency with Algorithms, Performance with Data Structures , this talk has really some good content on performance.

In this blog i will share some of ideas from above talk and few things that i have learned.

Pre processing 
This is very common trick, this is trade off between processing required when actual requests comes vs time taken to pre compute some thing, some of the good examples are.

IndexOf
This is very common string operation, almost all application needs this algorithm, java have brute force algorithm to solve this but this can become bottle neck very soon , so it is better to use Knuth–Morris–Pratt_algorithm algorithm, which pre computes table to get better performance.

Search
Sequential search vs binary search is trade off between search time or pre processing (i.e sorting) time, it starts to give return if number of compare required goes over 0(log n)

Binary search is heart of so many algorithm, this reduces expensive search operation.

Many String permutation search problems are solve using this.

Hashmap in java has nice optimization when many keys with same hashcode are added, to ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties

Indexes
Lookup is expensive operation and binary search can't answer all types of query,so you should build specialized index for fast search, some of the options are key/value, prefix/suffix , bitmap index, inverted index etc

Pre Allocate
This is classic one, so if you know how big your data structure will be then it is better to pre allocate it rather than keep it expanding multiple times and incur cost of allocation & copy.

Array based data structure doubles capacity every time re-allocation is required and this is done to amortized allocation cost, so they do big and less frequent allocation, in java world ArrayList, HashMap etc are good example of that.

Many design pattern like Object Pooling/Fly Weight etc are based on this.

Reduce Expensive/Duplicate operation
Each algorithm has some expensive operation and to make it efficient those expensive operation should be reduced, some of the common example are

HashCode
Hash code computation is expensive operation, so this should be not be computed multiple times for given key. One way is compute it once and pass it to other functions , so recomputation is not required or pre-compute it for eg String class in java pre compute hash code to save some time.

Modulus /Divide 
This is one of the expensive arithmetic operation, bit map operation can be used to do same thing but at lower cost.
If data structure is circular for eg Circular array and want to put value in free slot then Modulus operation is required and if capacity of array is power of 2 then bit wise operation ( i.e index & ( capacity-1) ) can be used to get Modulus value and this will give significant performance gain at the cost of extra memory.  This technique is used by HashMap in java

Similarly right shift ( >>) operation can be used for divide to get better performance, but now a days compiler are smart, so you get this one for free, no need to write it.

Reduce copy overhead
Array based data structure amortized re-sizing cost by increasing capacity by 2 times , this is good but overhead of copy also comes along with it, another approach is chain of arrays, so this way you only allocate one big chunk but don't have to do copy of old value, just add this new block of memory to chain of allocated blocks.
gs-collection has CompositeFastList which is build using this approach.

Batching
Expensive operation like write to file/network/database etc should be batched to get best performance.

Co-locate data
Keeping all required data together gives very good performance based on how processor works, but this is one thing that is broken in many application.
Mike Acton talk about this in Data-Oriented Design and C++ talk in details.

Column based storage are very good analytic/aggregation use case because most of the time single column data for all rows are required to answer analytic/aggregation request.

Linear probing hash tables are also good example of data co-location.

Bit Packing is another option to keep required data very close. but should be used with extra caution because this can make code complex.

Unnecessary Work
Some of algorithm suffer from this problem most common are recursive algorithm like factorial or Fibonacci etc, both of this has duplicate work problem, which can be fixed by Memoization technique.


Simplicity & readability of code should not be lost during efficiency ride because next guy has to maintain it :-)

Thursday, 16 July 2015

BitMap sorting

Data structure is most important building block of any algorithm and it makes algorithm efficient.

Bitmap is very powerful data structure and is good fit for many problems, it is used for indexing in many database.

BitMap can be also used for sorting and for some specific case bitmap sorting can out perform may famous sorting algorithm with very low memory footprint.

Take a case that you have to sort input file that has lots of number between 1 to 10 million by and using low memory footprint and for simplicity sake assume that their are no duplicates.

What are the ways to solve this problem ?

- Classic MergeSort
This is the first solution that comes to mind whenever sorting is discussed, this will produced many intermediate files and then those will be merged.

- Muti Read & Sort
In this approach input file is read multiple times for range of data, for eg first pass will read values starting from 0 to 250K and then 250K to 500K and so on.

This solution will take less memory but too much of I/O will make it very slow .

- Amazing BitSort
Bit Sorting is really good fit for this problem, allocate bit array and turn on the bits at value index and that's it, values will get sorted as it is added to this data structure .

For eg  
{1,2,5,6,7,11} value can be represented as below

Implementation of such type of data structure is simple in C/C++ but for java bitwise manipulation is required, so to keep it simple i have implemented it using boolean array.

Code snippet 


  
It is very simple to implement this powerful data structure and little trick is required for iteration over value.

Code snippet for iteration


Conclusion
Very useful data structure with low memory foot print & very fast sorting, but it has some trade off  and you should be aware of it

- If numbers of items are less then this might not give any benefit.
- If input values are sparse then lot of space is left unused because of fragmentation.

Code is available @ github

Tuesday, 7 July 2015

Joy Inc is really Joyful

I heard closing keynote talk by Richard-Sheridan at agile singapore conference where he shared how he build joyful company.

Richard has published book Joy, Inc.: How We Built a Workplace People Love to share his experience of building amazing organization.

I got copy of book few days back and started reading it, it is very difficult to put it down once you pick it up.
I have read few chapters in last 2 days and want to share few things that i liked .

Book starts with "Let's try something new, let't run an experiment", this is very good approach to plant seed for change.

Tangible Joy
Book define tangible joy as "delivering a product or service to world that's so enjoyed , in fact , that people stop you on the street and say,"really, you did that ? i love it"

By reading this i realized that how crappy product i have build :-(

Stupid Users
It has section on stupid user and context it that software industry think users are stupid and they need Dummies book to help them successfully use our beautifully designed technology because they are dump and don't understand our master piece.

Joy will lift you and your team
He gives reason why Wright brothers were able to fly and Samuel Pierpont Langley was not

Langley was trying to build an airplan. The Wright brothers wanted to fly.

This is so much applicable to our industry, we are so much after bleeding edge technology that we forget the real problem that is supposed to be solved by technology

Lots of experiment
This book is full of experiment he did to build Joy Inc.

Pair Programming
Everybody knows benefit of pairing but still many organization are reluctant to try

Wide Open space
Wide open space with no office,cubes is very good for collaborative work, but organization has Cube culture and with every promotion individual moves away from team and get locked in private office.

Rewards System
With culture change rewards system should also change , but most of the organization never change rewards system.
I think this is one of the main topic of book, few things mention in book are.

 Now you have fair idea about what rewards system should look like

Risk of staying same
This was eye opening "The risk of staying the same had been far greater than the risk of change"

Big-Dog Discussion
Big dogs in organization are special people and by looking at them voice echos in your ear
"You don't matter as much as I do"
@menlo big decision are not made in close door conference room so that people don't feel that they are not important.

Tower of knowledge
Book as very nice section about "Mr Dave" - person on team who has vast technical knowledge that no one else have.
This is such a common problem and most of the organization falls in this trap and it is very common in Banks.
Dave is bottleneck of entire organization or department and he can't go on vacation!
Pair programming is the recommended solution for Dave and organization


Books has lots of practical advise, i highly recommend this book.

Some talks/podcast by Richard
A CEO’S JOURNEY TO CREATING A CULTURE WHERE PEOPLE CAN THRIVE
Joy, Inc. | Rich Sheridan