Are you ready: Streams

Showing posts with label Streams. Show all posts

Saturday, 14 March 2020

Hands on Optional value

Optional is in air due to coronavirus, everything is becoming optional like optional public gathering , optional work from home, optional travel etc.

Image result for option scala

I though it is good time to talk about real "Optional" in software engineering that deals with NULL reference.

Tony Hoare confessed that he made billion dollar mistake by inventing Null. If you have not seen his talk then i will suggest to have look at Null-References-The-Billion-Dollar-Mistake.

I will share some of the anti pattern with null and how it can be solved using abstraction like Optional or MayBe.

For this example we will use simple value object that can have some null values.

This value object can have null value for email & phone number.

Scenario: Contact Person on both email and phone number

Not using optional
First attempt will be based on checking null like below

This is how it has been done for years. One more common pattern with collection result.

Use optional in wrong way

This is little better but all the goodness of Optional is thrown away by adding if/else block in code.

Always Happy optional

It is good be happy but when you try that with Optional you are making big assumption or you don't need optional.

Nested property optional
For this scenario we will extend Person object and add Home property. Not everyone can own home so it is good candidate that it will be not available .
Lets see how contacting person scenario work in this case

This is where it start to become worse that code will have tons of nested null checks.

Priority based default
for this scenario we first try to contact person on home address and if it is not available then contact on office address.
Such type of scenario require use of advance control flow for early return and makes code hard to understand and maintain.

These are some of the common pattern where optional are not used or used in wrong way.

Optional usage patterns
Lets look at some of good ways of using optional.

Make property optional based on domain knowledge
It is very easy to makes property optional.

Yes it is allowed to make get Optional, no one will hang you for that and feel free to do that without fear. Once that change is done we can write something like below

It looks neat, first step to code without explicit if else on application layer.

Use some power of Optional

Optional is just like stream, we get all functional map,filter etc support. In above example we are checking for OptIn before contacting.

Always happy optional
Always happy optional that calls "get" without check will cause runtime error on sunday midnight, so it advised to use ifPresent

Nested Optional

Flatmap does the magic and handles null check for home and convert insurance object also.

Priority based default

This example is taking both home & office address and pick the first one that has value for sending notification. This particular pattern avoids lots of nested loops.

Else branch
Optional has lots of ways to handle else part of the scenario like returning some default value(orElse) , lazy default value (orElseGet) or throw exception(orElseThrow).

What is not good about optional
Each design choice has some trade off and optional also has some. It is important to know what are those so that you can make careful decision.
Memory overhead
Optional is container that holds value, so extra object is created for every value. Special consideration is required when it holds primitive value. If some performance sensitive code will be impacted by extra object creation via optional then it is better to use null.

Memory indirection
As optional is container , so every access to value need extra jump to get real value. Optional is not good choice for element in array or collection.

No serialization
I think this is good decision by Jdk team that does not encourage people to make instance variable optional. You can wrap instance variable to Optional at runtime or when required for processing.

All the example used in post are available @ optionals github repo

If you like the post then you can follow me on twitter.

Saturday, 10 November 2018

SQL is Stream

Stream API for any language looks like writing SQL.

Map is Select Columns
filter is Where
count is Count(1)
limit is LIMIT X
collect is get all result on client side

So it is very easy to map all the functions of Streams API to some part of SQL.

Object relation mapping framework like (hibernate, mybatis, JPA, Toplink,ActiveRecord etc) give good abstraction over SQL but adds lot of overhead and also does not give much control on how SQL is build and many times you have write native SQL.

Image result for i hate hibernate

ORM never made writing SQL easy and if you don't trust me then quick refresh to how code looks .

Sometime i feel that engineer are writing more annotation than real algorithm!

To implement any feature we have to keep switching between SQL API and non sql API, this makes code hard to maintain and many times it is not optimal also.

This problem can be solved by having library that is based on Streams API and it can generate SQL then we don't have to switch, it becomes unified programming experience.

With such library testing will become easy as source of stream can be changed on need basis like in real env it is database and it test it is in memory data structure.

In this post i will share toy example of how library will look look like.

Code Snippet

Stream<StocksPrice> rows = stocksTable.stream();
long count = rows
.filter(Where.GT("volume", 1467200))
.filter(Where.GT("open_price", 1108d))
.count();

Above code generates
Select Count(1) From stocks_price where volume > 1467200 AND open_price > 1108

Look at another example with Limit

stocksTable.stream()
.filter(Where.GT("volume", 1467200))
.filter(Where.GT("open_price", 1108d))
.limit(2)
.collect(Collectors.toList());

Select stock_symbol,open_price,high_price,trade_date FROM stocks_price WHERE volume > 1467200 AND open_price > 1108.0 LIMIT 2

These API can also use code generation to give compile time safety like checking column names, type etc.

Benefits

Streams API comes will give some other benefits like
- Parallel Execution
- Join between database data and non db data can be easily done using map.
- Allows to use pure streaming approach and this is good when dealing with huge data.
- Opens up options of generating Native optimized query because multiple phase of pipeline can be merged.

This programming model is not new , it is very common in distributed computing framework like Spark, Kafka, Flink etc.

Spark dataset is based on this approach where it generates optimized query like pushing filters to storage, reducing reads by looking at partitions, selective column read etc.

Conclusion

Database driver must give stream based API and this will help in reducing dependency on ORM framework.
This is very powerful programming model and opens up lots of options.

Code used in this post is available @ streams github repo.

Wednesday, 14 January 2015

Java8 Parallel Streams

Recently found great article on when to use Parallel Stream by Doug lea and team. In this blog i will share some performance result of different data structure.

Splittable data structure
This is one of the most important factor that needs to be consider. All the parallel stream operation are executed using default fork join pool and fork join pool uses Divide and conquer algorithms to split stream in small chunks and apply function on small chunks, so splitting is important factor and if splitting is going to take more time then all computation is going to choke!

Types of datastructure

-Array
Array based data structure are most efficient data structure from splitting perspective because each element can be randomly accessed , so splitting is very quick operation.
Some of the example of array based collections are ArrayList & open addressing based Set/Map.

-Tree
Balanced tree based collection like TreeMap or ConcurrentHashMap. It is easy to chop the collection into 2 parts, but if tree is not balanced then splitting will be not that efficient because task load will be not equal for each thread.
Another thing to consider is that memory access pattern for tree based data structure is random , so there will be memory latency cost when elements are accessed.

-LinkedList
This type of data structure gives worst splitting performance because each element must be visited to split it. Some of the samples from JDK are LinkedList,Queues

- Others
This for all others type of datastructure for eg I/O based , BufferedReader.lines which returns stream but splitting operation is very expensive.

Some performance numbers
All performance tuning guess must be backed by experiment, so lets have look at some numbers.

Some code snippet
Array Stream
ArrayList Stream
Set Stream
LinkedList Stream

Measurement Code

Result

Array 31 sec
ArrayList 31 sec
set 52 sec
linkedlist 106 sec

Result confirms the guess that Array based collections are fastest, so choose right datastructure before you start using parallel streams.

Code used in blog is available @ github