Saturday, 26 May 2018

Custom Logs in Apache Spark

Have you ever felt the frustration of Spark job that runs for hours and it fails due to infra issue.
You know about this failure very late and waste couple of hours on it and it hurts more when Spark UI logs are also not available for postmortem.

You are not alone!

In this post i will go over how to enable your own custom logger that works well with Spark logger.
This custom logger will collect what ever information is required to go from reactive to proactive monitoring.
No need to setup extra logging infra for this.

Spark 2.X is based using Slf4j abstraction and it is using logback binding.

Lets start with logging basic, how to get logger instance in Spark jobs or application.

val _LOG = LoggerFactory.getLogger(this.getClass.getName)

It is that simple and now your application is using same log lib and settings that Spark is based on.

Now to do something more meaningful we have to inject our custom logger that will collect info and write it to Elastic search or Post to some REST endpoint or sends alerts.

lets go step by step to do this

Build custom log appender
Since spark 2.X is based on logback, so we have to write logback logger.

Code snippet for custom logback logger

This is very simple logger which is counting message per thread and all you have to do it override append function.

Such type of logger can do anything like writing to database or sending to REST endpoint or alerting .

Enable logger
For using new logger, create logback.xml file and add entry for new logger.
This file can be packed in Shaded jar or can be specified as runtime parameter.

Sample logback.xml
This config file adding MetricsLogbackAppender as METRICS
<appender name="METRICS" class="micro.logback.MetricsLogbackAppender"/>

Next enabling it for package/classes that should use this
<logger level="info" name="micro" additivity="true">    <appender-ref ref="METRICS" /></logger>
<logger level="info" name="org.apache.spark.scheduler.DAGScheduler" additivity="true">    <appender-ref ref="METRICS" /></logger>

You are done!

Any message logged from 'micro' package or from DAGScheduler class will be using new logger .
Using this technique executor logs can be also capture and this becomes very useful when spark job is running on hundred or thousands of executor.

Now it opens up lots of option of having BI that shows all these message at real time, allow team to ask interesting questions or subscribe to alters when things are not going well.

Caution : Make sure that this new logger is slowing down application execution, making it asynchronous is recommended.

Get the insight at right time and turn it to action

Code used in this blog is available @ sparkmicroservices repo in github.

I am interested in knowing what logging patterns you are using for Spark.

11 comments:

  1. Good job! Fruitful article. I like this very much. It is very useful for my research. It shows your interest in this topic very well. I hope you will post some more information about the software. Please keep sharing!!
    Hadoop Training in Chennai
    Big Data Training in Chennai
    Devops Training in Chennai
    Digital Marketing Course in Chennai
    RPA Training in Chennai
    SEO Training in Chennai
    Hadoop Training in Tambaram
    Hadoop Training in Porur

    ReplyDelete
  2. Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating big data online training

    ReplyDelete
  3. Can you explain it briefly?.

    ReplyDelete
  4. After research a couple of the weblog posts in your web site now, and I actually like your way of blogging. I bookmarked it to my bookmark website list and shall be checking back soon. Pls check out my web page as well and let me know what you think. buy bank logs

    ReplyDelete
  5. Learn Amazon Web Services for excellent job opportunities from Infycle Technologies, the best AWS training center in Chennai. Infycle Technologies gives the most trustworthy AWS course in Chennai, with full hands-on practical training from professional trainers in the field. Along with that, the placement interviews will be arranged for the candidates, so that, they can meet the job interviews without missing them. To transform your career to the next level, call 7502633633 to Infycle Technologies and grab a free demo to know more.No.1 AWS Training Institute in Chennai | Infycle Technologies

    ReplyDelete
  6. This blog was very nicely formatted; it maintained a flow from the first word to the last. My Chemical Romance Jetstar Jacket

    ReplyDelete
  7. Your site is good Actually, i have seen your post and That was very informative and very entertaining for me. 4th Hokage Coat

    ReplyDelete
  8. Planning to move locally inside city We Know Carrying goods carefully and Safely From One place to other place is a difficult part, No worries we have professional and trained employees will take care.Moving to other states or cities is made simple by Bigguymover’s low-cost domestic packing and moving services in INDIA. Relocation Services by Bigguymover
    Packers and movers bangalore near me
    Packers and movers bangalore online
    Top packers and movers in bangalore
    Packers and Movers bangalore price
    Top 5 packers and movers in bangalore


    ReplyDelete

  9. Nice post. Thanks for sharing! I want people to know just how good this information is in your article. It’s interesting content and Great work.
    AWS Training Institute in Chennai

    ReplyDelete