Sunday, 10 February 2019

Adaptive scheduling of Spark Job using YARN API

In last blog poorman spark monitoring i shared approach on how to figure out how long Spark Job is waiting for resource.

This post covers some more details on how to be proactive when Spark Job is stuck due to resource constraint.

Little recap of Yarn.
Apache Yarn is resource management and job scheduling framework for hadoop distributed processing framework.

MapReduce NextGen Architecture

Hadoop cluster are shared between teams and for proper utilization of cluster teams/projects are allocated some capacity of cluster.

One of the popular scheduling approach is "Capacity Scheduler" for multi tenant cluster and it is based on Queues.
Yarn allows to define min & max resource for Queue and it is hierarchical, it looks something like below

Capacity Scheduler


One of the issues that can happen in Capacity Scheduler is that your job is submitted to overloaded queue and it gets stuck in Accepted state for long time although other queues has some capacity which is just left unused.
Another common issue is Job started running but did not got all the resource(cores/memory) and will run forever because queue is overloaded.

Yarn gives REST API to query state of cluster/queues/application and that can be used to solve issues where resource is available in cluster but application is not using it :-)

Yarn API to build adaptive job submission
Yarn API comes very handy in solving both of the above issue, some of the strategy using yarn API.

 - Submit job to queue that has capacity.
This type of strategy will select queue at run-time and submit application to least loaded queue.

 - Move Job to queue that has capacity.
This type of strategy will monitor job status and if it is not moving or get stuck in "Accepted" state then will move it to queue that has some capacity.

Abstraction of Yarn API to get minimum details that will allow adaptive job submission.

Once we get all the metrics required for making decision then it becomes straight forward to submit/move the job.
Below code snippet try to move the job based on simple strategy of max wait time for Accepted status App.

Yarn exposes lots of metrics that can used to building adaptive system. You can refer to ResourceManagerRest for full set of API.

Word of caution that be fair when you are using this strategy, don't use whole cluster alone.
Image result for greedy


Code used in post is available @ yarn github project

Sunday, 3 February 2019

Golang control statements

We are going to explore Go lang control structure, this is not covering all the control statement but you can refer control-structures from effective go to get all the details.

Refer to index page for all the content written so far.

One thing that i like about control structure is that it is very easy to understand.
Focus on readability is clearly seen .


If statement
Go lang author managed to removed extra bracket in if statement, it looks something like 

if x > 10 {
fmt.Println("I am gt ", x)
} else {
fmt.Println("I am lt ", x)
}

Another variation that includes initialization and condition both  

if value := time.Now().Weekday(); value == time.Sunday {
fmt.Println("Yahoooo.. today is sunday")
} else {
fmt.Println("Lets get back to work. I hate", value)

}

Switch Statement
Switch case has few variations 

Simple one
value := 10
switch value {
case 10:
fmt.Println("Value is 10")
default:
fmt.Println("Some other value than 10")

}

With No expression
switch {
case value >= 10:
fmt.Println("Value is gt 10")
case value >= 20:
fmt.Println("Value is gt 20")

}

Switch with multiple condition in single case

specialValue := '@'
switch specialValue {
case '@', '!', '#':
fmt.Println("This is special value")
default:
fmt.Println("This is normal value")

}

Switch with type assertion 
Type assertion can be only done using switch case using variable.(type) expression.

var t interface{}
t = "James"
switch t.(type) {
case int:
fmt.Println("Int value", t)
case string:
fmt.Println("String value", t)

}

Loops
Has only one type of loop(while) and it can be used for all the purpose.

C/Java like
It has init, condition,post section.

value := 0
for counter := 0; counter < 10; counter++ {
value++
}

fmt.Println(value)

Just condition
value = 0
for value < 10 {
value++
}

fmt.Println(value)

Infinite (with no condition)

for {
value++
if value > 10 {
break
}

}

Smart loops
This is useful when dealing with arrays/map/channels

days := []string{"Sunday", "Monday"}
for index, value := range days {
fmt.Println("Index ", index, "Value ", value)

}

range keyword is very power full it works with all the collections types.
Another thing i like about golang is that compiler helps with lot of common error for e.g unused variable are compiler error, so below example is error because index is not used.

days := []string{"Sunday", "Monday"}
for index, value := range days {
fmt.Println("Value ", value)

}

It is possible to ignore the value by using "_" for eg

days := []string{"Sunday", "Monday"}
for _, value := range days {
fmt.Println("Value ", value)

}

Sample used in this post is available @ 003-statement github repo