Thursday, 22 November 2018

Spark Run local design pattern

Many spark application has now become legacy application and it becomes very hard to enhance, test & run locally.

Spark has very good testing support but still many spark application is not testable.
I will share one common error that you see when try to run some old spark application.




When you see such error you have 2 option
 - Forget it that it can't be run locally and continue work with this frustration.
 - Fix it to run locally and show example of The Boy Scout Rule to your team


I will show very simple pattern that will save from such frustration.

This code is using isLocalSpark function to decided how to handle local mode and you can use any technique to make that decision like have env parameter or command line parameter or any thing else.

Once you know it is run local then create spark context based on it.

Now this code can run locally or via Spark-Submit also.

Happy Spark Testing.
Image result for i love testing

Code used in this blog is available @ runlocal repo

6 comments:

  1. I like your post very much. It is very much useful for my research. I hope you to share more info about this. Keep posting Spark Online Training

    ReplyDelete
  2. Thank you so much for this nice information. Hope so many people will get aware of this and useful as well. And please keep update like this.

    Big Data Services

    Data Lake Services

    Advanced Analytics Solutions

    Full Stack Development Services

    ReplyDelete
  3. The comprehensive migration solutions and the automated mapping of data between the target system and the source provided by your company help in successive data migration. It makes your company be one of the data migration service companies

    ReplyDelete
  4. If you're responsible for overseeing an entire organization's data management, you're doing more than just keeping the lights on. You're looking at the long term and laying the groundwork for your organization's data strategy. That strategy might includedata warehousing which is a practice of collecting, cleansing, and organizing data from disparate sources for central storage and analysis.

    ReplyDelete