Saturday, 26 May 2018

Spark Microservices

As continuation of big data query system blog, i want to share more techniques for building Analytics engine.

Take a problem where you have to build system that will be used for analyzing customer data at scale.

What options are available to solve this problem ?
 - Load the data in your favorite database and have right indexes.
   This works when data is small, when i say small less then 1TB or even less.

 - other option is to use something like elastic search 
Elastic search works but it comes up with overhead of managing another cluster and shipping data to elastic search

 -use spark SQL or presto 
Using these for interactive query is tricky because of minimum overhead that is required to execute query can be more than latency required for query which could be 1 or 2 sec.

 -use distributed In-Memory database. 
This looks good option but it also has some issues like many solution is proprietary and open source one will have overhead similar to Elastic Search.

 - Spark SQL by removing Job start overhead.
I will deep dive in to this option. Spark has become number one choice for build ETL pipeline because of simplicity and big community support and Spark SQL can connect to any data source ( JDBC,Hive ,ORC, JSON, Avro etc).

Analytics query generate different type of load, it only needs few columns from the whole set and executes some aggregate function over it, so column based database will make good choice for analytics query.

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
So using Spark data can converted to parquet and then Spark SQL can be used on top of it to answer analytics query.

To put all in context convert HDFS data to parquet(i.e column store), have a micro services that will open Sparksession  , pin data in memory and keep spark session open forever just like database pool connection.

Connection pool is more than decade old trick and it can be used for spark session to build analytics engine.

High level diagram on how this will look like

Spark Session is thread safe, so no need to add any locks/synchronization.
Depending on use case single or multiple  spark context can be created in single JVM.



Spark 2.X has simple API to create singleton instance for SparkContext and handles thread based SparkSession also.
Code snippet for creation spark session


Caution
All this works fine if you have micro service running on single machine but if this micro service is load balanced then each instance will have one context.
If single spark context requests for thousands of cores then some strategy is required to load balancing Spark context creation. This is same as database pool issue, you can only request for resource that is physically available.

Another thing to remember that now driver is running in web container so allocate proper memory to process so that web server does not blow up with out of memory error.

I have create micro services application using Spring boot and it is hosting Spark Session session via Rest API.

This code has 2 types of query
 - Single query per http thread
 - Multiple query per http thread. This model is very powerful and can be used for answering complex query.

Code is available on github @ sparkmicroservices

29 comments:

  1. Thanks for posting such a great article.you done a great job.
    Microservices training in Hyderabad

    ReplyDelete
    Replies
    1. Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important.Project Center in Chennai

      Python Training in Chennai Python Training in Chennai The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training Project Centers in Chennai

      Delete
  2. I have some wonderful microservices interview questions right here

    ReplyDelete
  3. I like your post very much. It is very much useful for my research. I hope you to share more info about this. Keep posting!!
    Best Devops Training Institute

    ReplyDelete
  4. Great article ...Thanks for your great information, the contents are quiet interesting.
    Microservices Online Training
    Microservices Training in Hyderabad

    ReplyDelete
  5. Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating big data online training

    ReplyDelete
  6. I like your post very much. It is very much useful for my research. I hope you to share more info about this. Keep posting Apache Spark Training

    ReplyDelete
  7. Thanks for Sharing This Article.It is very so much valuable content. I hope these Commenting lists will help to my website
    best microservices online training

    ReplyDelete
  8. Thank you so much for this nice information. Hope so many people will get aware of this and useful as well. And please keep update like this.

    Big Data Services

    Data Lake Services

    Advanced Analytics Services

    Full Stack Development Services

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. Your company is one of the data migration solutions providers , which had helped in integrating useful data for business purposes. Applications designing, as well as the correct solutions of data migration, are helpful for accurate loading data.

    ReplyDelete
  11. Thanks for Sharing This Article.It is very so much valuable content. I hope these Commenting lists will help to my website
    angular js online training
    best angular js online training
    top angular js online training

    ReplyDelete
  12. Thanks for Sharing This Article.It is very so much valuable content. I hope these Commenting lists will help to my website
    angular js online training
    best angular js online training
    top angular js online training

    ReplyDelete
  13. You have leads in hand, your job being a digital marketer is done. Many business owners are able to increase sales and their brand recognition easily with the help of digital marketing on completing our Digital Marketing Course in Hyderabad. In 2021, after the pandemic, businesses are investing more in digital marketing. You could see a huge decline in Traditional marketing efforts. That is truly a positive sign. You master Digital Marketing and you would prosper.

    ReplyDelete
  14. Our Affiliate Marketing course in Hyderabad will train you on successful marketing strategies that will help you earn money from different affiliate networks, we will demonstrate how to sign-up for an affiliate program and get approved from them. You will learn amazing new ways to generate traffic to your website. Converting the traffic to sales is very important for which we will guide you to create a winning landing page.

    ReplyDelete
  15. Data Science Course in Hyderabad
    Become a Data Science Expert with us. we provide classroom training on IBM Certified Data Science at Hyderabad for the individuals who believe hand-held training. We teach as per the Indian Standard Time (IST) with In-depth practical Knowledge on each topic in classroom training, 80 – 90 Hrs of Real-time practical training classes. There are different slots available on weekends or weekdays according to your choices.

    ReplyDelete
  16. Do check our blog MBA in Artificial Intelligence if anyone having a keen interest in Artificial Intelligence.Also check Other courses below.Online MBA in Data Science
    Online MBA in Business Intelligence

    ReplyDelete
  17. Get trained on data science course in hyderabad by real-time industry experts and excel your career with Data Science training by Technologyforall. #1 online training institute for Data Science.

    ReplyDelete
  18. Become a career-ready expert in the field of Python by getting enrolled for the advanced Python Course in Hyderabad from the hands of domain experts at AI Patasala.

    ReplyDelete
  19. Here is the best music to calm and relax your mind

    1. best relaxing music
    2. best Depp sleep music
    3. best meditation music
    4. best calm music
    5. best deep focus music

    ReplyDelete
  20. Very Nice Blog…Thanks for sharing this information with us. Here am sharing some information about training institute.
    tektutes tableautraining

    ReplyDelete