Many spark application has now become legacy application and it becomes very hard to enhance, test & run locally.
Spark has very good testing support but still many spark application is not testable.
I will share one common error that you see when try to run some old spark application.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration | |
at org.apache.spark.SparkContext.<init>(SparkContext.scala:376) | |
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509) | |
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909) | |
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901) | |
at scala.Option.getOrElse(Option.scala:121) |
When you see such error you have 2 option
- Forget it that it can't be run locally and continue work with this frustration.
- Fix it to run locally and show example of The Boy Scout Rule to your team
I will show very simple pattern that will save from such frustration.
- Forget it that it can't be run locally and continue work with this frustration.
- Fix it to run locally and show example of The Boy Scout Rule to your team
I will show very simple pattern that will save from such frustration.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def main(args: Array[String]): Unit = { | |
val localRun = SparkContextBuilder.isLocalSpark | |
val sparkSession = SparkContextBuilder.newSparkSession(localRun, "Happy Local Spark") | |
val numbers = sparkSession.sparkContext.parallelize(Range.apply(1, 1000)) | |
val total = numbers.sum() | |
println(s"Total Value ${total}") | |
} |
This code is using isLocalSpark function to decided how to handle local mode and you can use any technique to make that decision like have env parameter or command line parameter or any thing else.
Once you know it is run local then create spark context based on it.
Now this code can run locally or via Spark-Submit also.
Happy Spark Testing.

Code used in this blog is available @ runlocal repo
Once you know it is run local then create spark context based on it.
Now this code can run locally or via Spark-Submit also.
Happy Spark Testing.

Code used in this blog is available @ runlocal repo