Data Engineering Medium Asked at DatabricksAsked at AmazonAsked at GoogleAsked at LinkedInAsked at Netflix

Explain the Spark driver/executor model and what each component does.

For Data Engineer MLOps Engineer ML Engineer

The short answer

The driver is a single JVM process that hosts the SparkContext, builds the DAG, schedules tasks, and coordinates results. Executors are JVM processes on worker nodes that actually run tasks and cache data. The cluster manager (YARN, Kubernetes, standalone) sits between them, allocating resources.

How to think about it

Every Spark application follows a master/worker topology. Getting this model wrong is the root cause of many misconfigurations.

The driver

The driver is the process that runs your main() function (or the PySpark shell / notebook kernel). Its responsibilities:

Hosts SparkContext / SparkSession
Parses user code into a logical plan, optimizes it with Catalyst, and builds a physical DAG
Negotiates resources with the cluster manager
Sends task specifications to executor slots
Collects results from actions like collect() and show()

Because all coordination flows through the driver, it must stay alive for the entire job. A driver OOM kills the application.

Executors

Executors are launched by the cluster manager on worker nodes at job startup (or dynamically with dynamic allocation). Each executor:

Receives serialized tasks from the driver and runs them in thread pools
Caches RDD/DataFrame partitions in memory or disk when cache() / persist() is called
Writes shuffle output to local storage for downstream tasks
Sends task results and status back to the driver

spark = SparkSession.builder \
    .config("spark.executor.memory", "8g") \
    .config("spark.executor.cores", "4") \
    .config("spark.executor.instances", "20") \
    .getOrCreate()

Cluster manager

YARN, Kubernetes, and Spark Standalone all play the same role: they allocate containers/pods for executors and report available resources to the driver. The driver requests N executors with X cores and Y memory; the cluster manager decides which nodes provide them.

Driver negotiates resources with the cluster manager; executors receive serialized tasks directly from the driver.

Dynamic allocation

With spark.dynamicAllocation.enabled = true, the cluster manager adds executors when tasks are queued and removes idle executors after a timeout — reducing cloud resource waste for jobs with uneven parallelism.