Readers Guide

The sole purpose of this documentation is to ensure the approaches used in tuning one of the Spark ETL(Extract, Transform, Load) jobs and share it with a broader audience who could benefit from it and most importantly receive feedback. The findings are by no means set guidelines as they are limited to address one of the performance issue we faced.

The need for tuning Spark jobs?

We have all heard the term "Data is King." As more and more organizations realize the power of data, big-data and analytics has played a vital role in their organization's digital transformation. As the need for most companies to transition their legacy data from traditional in-house to the cloud raises, cloud platforms such as AWS, Microsoft Azure, and Google cloud are becoming more prominent in the organization's data-driven journey. While these platforms provide on-demand, pay-as-you-go, and set guidelines to help reduce the overall cost, organizations are still required to optimize their usage.

Additional comments

This documentation uses Mkdocs, a static site generator primarily used for project documentation configured using YAML.

Use the toggle option next to the search bar to switch between dark and light mode