Are you ready: MicroServices

Showing posts with label MicroServices. Show all posts

Friday, 19 October 2018

When microservices becomes darkservices

Micro services is great and many company comes and talk about it on how it is used for scaling team, product etc

Microservices has dark side also and as a programmer you should about it before going on ride.
In this post i will share some of the myths/dark side about micro services

We needs lots of micro services

Before you create any new micro services think about distributed computing because most of the micro services are remote process. First define what "micro" means in problem context it could be lines of code , features or deployment etc

Naming micro services will be easy

Computer science has only 2 complex problem and one of them is "naming", very soon you will run out of options when you have 100s of them.

Non functional requirement can be done later

Suddenly non function requirement like ( latency, throughput, security, reliability etc) becomes very important from day one.

Polyglot programming/persistence or something poly...

Software engineer likes to try latest cutting edge tool so they get carried away by this myth that we can use any language or any framework or any persistence.

Think about skills and maintenance overhead required for poly.... thing that is added, if you have more than 2/3 things then it is not going to fit in head and you have to be on pager duty.

Monitoring is easy

This is one of the most ignored fact about micro services and come as afterthought.
For simple investigation you have to login to many machines , looks in logs , make sure you get the timing right on server etc.

Without proper monitoring tools you can't do this, you need ELK or DataDog type of things.

Read and writes are easy

This thing also get ignored now you are in distributed transaction world and it is not good place to be in and to handle this you need eventual consistent system or non available system.

Everything is secure

Now one service is talking to another services using API, so you need good auth system to make sure your system is secure. If you work in financial system then you will be spending more time in answering security related questions.

My service will be always up

That will never happen no matter how good programmer or infra you have, service will go down and now you are in Middleware land(Kafka,ActiveMq,ZeroMQ etc) to handle this , so that request can be queued while service was not available.

I can add break point to debug it

This is just not possible because now you are in remote process and don't know how many micro services are involved in single request.

Testing will be same

Testing is never same as monolithic, you need better automated test to get out of testing hell.

No code duplication

As you add more services, code sharing becomes hard because any change in some common code required good testing and to avoid that many team start code duplication.

JSON over HTTP

This is one of the biggest myth that all micro services must have Json over Http and it is user facing.

This has resulted in explosion of REST based API for every micro services and is the reason of why many system are slow because they used text based protocol with no type information.

One thing you want to take away from anti pattern of micro services is that rethink that do you really need Json/REST for every service or you can use other optimized protocol and encoding.

Versioning is my grandfather job

Since most of the micro services are remote process , so you have to come with request/response spec and have to manage version for backward compatibility.

Team communication remains same.

This is like hidden elephant in room with more services more team communication is required to keep them posted about what is current version, where it is running , what is broken etc.

You can have more silos because no one knows about whole system

Your product is of google/facebook/netflix scale

This is like buy lottery ticket that you are never going to win.

If you can't write decent modular monolithic then don't try micro services because it is all about getting correct coupling and cohesion. Modules should be loosely coupled and high cohesive.

No free lunch with micro services and if you get it wrong then you will be paying premium price :-)

Saturday, 26 May 2018

Spark Microservices

As continuation of big data query system blog, i want to share more techniques for building Analytics engine.

Take a problem where you have to build system that will be used for analyzing customer data at scale.

What options are available to solve this problem ?

- Load the data in your favorite database and have right indexes.

This works when data is small, when i say small less then 1TB or even less.

- other option is to use something like elastic search

Elastic search works but it comes up with overhead of managing another cluster and shipping data to elastic search

-use spark SQL or presto
Using these for interactive query is tricky because of minimum overhead that is required to execute query can be more than latency required for query which could be 1 or 2 sec.

-use distributed In-Memory database.
This looks good option but it also has some issues like many solution is proprietary and open source one will have overhead similar to Elastic Search.

- Spark SQL by removing Job start overhead.

I will deep dive in to this option. Spark has become number one choice for build ETL pipeline because of simplicity and big community support and Spark SQL can connect to any data source ( JDBC,Hive ,ORC, JSON, Avro etc).

Analytics query generate different type of load, it only needs few columns from the whole set and executes some aggregate function over it, so column based database will make good choice for analytics query.

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
So using Spark data can converted to parquet and then Spark SQL can be used on top of it to answer analytics query.

To put all in context convert HDFS data to parquet(i.e column store), have a micro services that will open Sparksession , pin data in memory and keep spark session open forever just like database pool connection.

Connection pool is more than decade old trick and it can be used for spark session to build analytics engine.

High level diagram on how this will look like

Spark Session is thread safe, so no need to add any locks/synchronization.
Depending on use case single or multiple spark context can be created in single JVM.

Spark 2.X has simple API to create singleton instance for SparkContext and handles thread based SparkSession also.
Code snippet for creation spark session

Caution
All this works fine if you have micro service running on single machine but if this micro service is load balanced then each instance will have one context.
If single spark context requests for thousands of cores then some strategy is required to load balancing Spark context creation. This is same as database pool issue, you can only request for resource that is physically available.

Another thing to remember that now driver is running in web container so allocate proper memory to process so that web server does not blow up with out of memory error.

I have create micro services application using Spring boot and it is hosting Spark Session session via Rest API.

This code has 2 types of query
- Single query per http thread
- Multiple query per http thread. This model is very powerful and can be used for answering complex query.

Code is available on github @ sparkmicroservices