Saturday, 26 May 2018

Custom Logs in Apache Spark

Have you ever felt the frustration of Spark job that runs for hours and it fails due to infra issue.
You know about this failure very late and waste couple of hours on it and it hurts more when Spark UI logs are also not available for postmortem.

You are not alone!

In this post i will go over how to enable your own custom logger that works well with Spark logger.
This custom logger will collect what ever information is required to go from reactive to proactive monitoring.
No need to setup extra logging infra for this.

Spark 2.X is based using Slf4j abstraction and it is using logback binding.

Lets start with logging basic, how to get logger instance in Spark jobs or application.

val _LOG = LoggerFactory.getLogger(this.getClass.getName)

It is that simple and now your application is using same log lib and settings that Spark is based on.

Now to do something more meaningful we have to inject our custom logger that will collect info and write it to Elastic search or Post to some REST endpoint or sends alerts.

lets go step by step to do this

Build custom log appender
Since spark 2.X is based on logback, so we have to write logback logger.

Code snippet for custom logback logger

This is very simple logger which is counting message per thread and all you have to do it override append function.

Such type of logger can do anything like writing to database or sending to REST endpoint or alerting .

Enable logger
For using new logger, create logback.xml file and add entry for new logger.
This file can be packed in Shaded jar or can be specified as runtime parameter.

Sample logback.xml
This config file adding MetricsLogbackAppender as METRICS
<appender name="METRICS" class="micro.logback.MetricsLogbackAppender"/>

Next enabling it for package/classes that should use this
<logger level="info" name="micro" additivity="true">    <appender-ref ref="METRICS" /></logger>
<logger level="info" name="org.apache.spark.scheduler.DAGScheduler" additivity="true">    <appender-ref ref="METRICS" /></logger>

You are done!

Any message logged from 'micro' package or from DAGScheduler class will be using new logger .
Using this technique executor logs can be also capture and this becomes very useful when spark job is running on hundred or thousands of executor.

Now it opens up lots of option of having BI that shows all these message at real time, allow team to ask interesting questions or subscribe to alters when things are not going well.

Caution : Make sure that this new logger is slowing down application execution, making it asynchronous is recommended.

Get the insight at right time and turn it to action

Code used in this blog is available @ sparkmicroservices repo in github.

I am interested in knowing what logging patterns you are using for Spark.

16 comments:

  1. Good job! Fruitful article. I like this very much. It is very useful for my research. It shows your interest in this topic very well. I hope you will post some more information about the software. Please keep sharing!!
    Hadoop Training in Chennai
    Big Data Training in Chennai
    Devops Training in Chennai
    Digital Marketing Course in Chennai
    RPA Training in Chennai
    SEO Training in Chennai
    Hadoop Training in Tambaram
    Hadoop Training in Porur

    ReplyDelete
  2. Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating big data online training

    ReplyDelete
  3. Great Article. Thank you for sharing! Really an awesome post for every one.

    IEEE Final Year projects Project Centers in Chennai are consistently sought after. Final Year Students Projects take a shot at them to improve their aptitudes, while specialists like the enjoyment in interfering with innovation. For experts, it's an alternate ball game through and through. Smaller than expected IEEE Final Year project centers ground for all fragments of CSE & IT engineers hoping to assemble. Final Year Project Domains for IT It gives you tips and rules that is progressively critical to consider while choosing any final year project point.

    Spring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Spring Framework Corporate TRaining the authors explore the idea of using Java in Big Data platforms.
    Specifically, Spring Framework provides various tasks are geared around preparing data for further analysis and visualization. Spring Training in Chennai

    ReplyDelete
  4. Can you explain it briefly?.

    ReplyDelete
  5. After research a couple of the weblog posts in your web site now, and I actually like your way of blogging. I bookmarked it to my bookmark website list and shall be checking back soon. Pls check out my web page as well and let me know what you think. buy bank logs

    ReplyDelete
  6. Learn Amazon Web Services for excellent job opportunities from Infycle Technologies, the best AWS training center in Chennai. Infycle Technologies gives the most trustworthy AWS course in Chennai, with full hands-on practical training from professional trainers in the field. Along with that, the placement interviews will be arranged for the candidates, so that, they can meet the job interviews without missing them. To transform your career to the next level, call 7502633633 to Infycle Technologies and grab a free demo to know more.No.1 AWS Training Institute in Chennai | Infycle Technologies

    ReplyDelete
  7. Worth reading! Our experts also have given detailed inputs about these trainings & courses! Presenting here for your reference. Do checkout
    Aws training in chennai & enjoy learning more about it.

    ReplyDelete
  8. career towards a sky-high with Infycle Technologies. Infycle Technologies provides the top Oracle DBA Training in Chennai and offering programs in Oracle such as Oracle PL/SQL, Oracle Programming, etc., in the 200% hands-on practical training with professional specialists in the field. In addition to that, the interviews will be arranged for the candidates to set their careers without any struggle. Of all that, Cen percent placement assurance will be given here. To learn Oracle DBA for making the best job for your life, call 7502633633 to Infycle Technologies and grab a free demo to know more.
    No.1 Oracle DBA Training in Chennai | Infycle Technologies

    ReplyDelete
  9. wonderful article contains lot of valuable information. Very interesting to read this article.
    I would like to thank you for the efforts you had made for writing this awesome article.
    Reactjs Training in Chennai |
    Best Reactjs Training Institute in Chennai |
    Reactjs course in Chennai |

    ReplyDelete
  10. Infycle Technologies, the top software training institute and placement center in Chennai offers the Oracle course in Chennai for freshers, students, and tech professionals at the best offers. In addition to the Oracle training, other in-demand courses such as DevOps, Data Science, Python, Selenium, Big Data, Java, Power BI, Digital Marketing will also be trained with 100% practical classes. After the completion of training, the trainees will be sent for placement interviews in the top MNC's. Call 7504633633 to get more info and a free demo. Get Oracle Course in Chennai | Infycle Technologies

    ReplyDelete
  11. Infycle Technologies, the top software training institute and placement center in Chennai offers the Best Digital Marketing course in Chennai for freshers, students, and tech professionals at the best offers. In addition to Digital Marketing, other in-demand courses such as DevOps, Data Science, Python, Selenium, Big Data, Java, Power BI, Oracle will also be trained with 100% practical classes. After the completion of training, the trainees will be sent for placement interviews in the top MNC's. Call 7504633633 to get more info and a free demo.

    ReplyDelete
  12. Finish the Selenium Training in Chennai from Infycle Technologies, the best software training institute in Chennai which is providing professional software courses such as Data Science, Artificial Intelligence, Java, Hadoop, Big Data, Android, and iOS Development, Oracle, etc with 100% hands-on practical training. Dial 7502633633 to get more info and a free demo and to grab the certification for having a peak rise in your career.

    ReplyDelete