Recently i was doing performance tuning of application startup time, it was taking close to 30 min and cold restart was real pain.
I this blog i will share story of this performance tuning experience.
Current state of application
Application load data from database at the startup time and keeps it in memory for fast response time and to make these things interesting all loading happens using multiple threads :-)
Current loading logic is described below.
By looking at above logic it is clear that lock & database query per record must be causing problem and profiling confirmed that.
Above code went through couple of rounds of improvement before it reached to acceptable timing.
Round 1- Remove nested query
In this round per record database query was removed with one query to bring all the data required for record and then per record request were served using that master data set.
So after that changes code looks something like this.
This gave 30% improvement , that was good starting point with little trade off of extra transient memory.
Round 2 - Reduce scope of lock
Since this code was multi threaded , so this time profiling showed hotspot on lock and way to avoid that is either remove the lock or reduce scope of lock.
Scope of lock was reduced & this allowed to break logic in 2 step
- Read from database
- Update cache.
Earlier database query was done after lock was acquired and with new approach it changed and that allowed all parallel request to query the database with no contention on cache.
Code looked something like this
This gave another 40% gain with little more trade off of transient memory but memory was not the issue because oracle resultset only releases memory after it is closed, so memory wise it is no significant difference.
70% of improvement was great but it has more scope, so one improvement was done to make it faster.
Round 3 - Single Writer Batch update
Now all the bottle neck was on "write to cache" step because of multiple writers and it was reduced by using Single writer doing batch update to cache.
db query reader & cache writer were connected using queue, after this change code looked something like this.
Now lock was acquired only few time and maximum data was written using that lock, this gave around 25% gain.
With above improvement startup time was improved by 95% and it was enough to stop more experiment.
Conclusion
- Avoid making lots of small query to database in loop.
- Never do I/O or network call when lock is acquired.
- Reduce scope of lock.
- Batch expensive operation
I this blog i will share story of this performance tuning experience.
Current state of application
Application load data from database at the startup time and keeps it in memory for fast response time and to make these things interesting all loading happens using multiple threads :-)
Current loading logic is described below.
By looking at above logic it is clear that lock & database query per record must be causing problem and profiling confirmed that.
Above code went through couple of rounds of improvement before it reached to acceptable timing.
Round 1- Remove nested query
In this round per record database query was removed with one query to bring all the data required for record and then per record request were served using that master data set.
So after that changes code looks something like this.
This gave 30% improvement , that was good starting point with little trade off of extra transient memory.
Round 2 - Reduce scope of lock
Since this code was multi threaded , so this time profiling showed hotspot on lock and way to avoid that is either remove the lock or reduce scope of lock.
Scope of lock was reduced & this allowed to break logic in 2 step
- Read from database
- Update cache.
Earlier database query was done after lock was acquired and with new approach it changed and that allowed all parallel request to query the database with no contention on cache.
Code looked something like this
This gave another 40% gain with little more trade off of transient memory but memory was not the issue because oracle resultset only releases memory after it is closed, so memory wise it is no significant difference.
70% of improvement was great but it has more scope, so one improvement was done to make it faster.
Round 3 - Single Writer Batch update
Now all the bottle neck was on "write to cache" step because of multiple writers and it was reduced by using Single writer doing batch update to cache.
db query reader & cache writer were connected using queue, after this change code looked something like this.
Now lock was acquired only few time and maximum data was written using that lock, this gave around 25% gain.
With above improvement startup time was improved by 95% and it was enough to stop more experiment.
Conclusion
- Avoid making lots of small query to database in loop.
- Never do I/O or network call when lock is acquired.
- Reduce scope of lock.
- Batch expensive operation
No comments:
Post a Comment