How multiple executors are managed on the worker nodes with a Spark standalone cluster?

MetallicPriest

Until now, I have only used Spark on a Hadoop cluster with YARN as the resource manager. In that type of cluster, I know exactly how many executors to run and how the resource management works. However, know that I am trying to use a Standalone Spark Cluster, I have got a little bit confused. Correct me where I am wrong.

From this article, by default, a worker node uses all the memory of the node minus 1 GB. But I understand that by using SPARK_WORKER_MEMORY, we can use lesser memory. For example, if the total memory of the node is 32 GB, but I specify 16 GB, Spark worker is not going to use anymore than 16 GB on that node?

But what about executors? Let us say if I want to run 2 executors per node, can I do that by specifying executor memory during spark-submit to be half of SPARK_WORKER_MEMORY, and if I want to run 4 executors per node, by specifying executor memory to be the quarter of SPARK_WORKER_MEMORY?

If so, besides executor memory, I would also have to specify executor cores correctly, I think. For example, if I want to run 4 executors on a worker, I would have to specify executor cores to be the quarter of SPARK_WORKER_CORES? What happens, if I specify a bigger number than that? I mean if I specify executor memory to be the quarter of SPARK_WORKER_MEMORY, but executor cores to be only half of SPARK_WORKER_CORES? Would I get 2 or 4 executors running on that node in that case?

MetallicPriest

So, I experimented with the Spark Standalone cluster myself a bit, and this is what I noticed.

  1. My intuition that muliple executors can be run inside a worker, by tuning executor cores was indeed correct. Let us say, your worker has 16 cores. Now if you specify 8 cores for executors, Spark would run 2 executors per worker.

  2. How many executors run inside a worker also depend upon the executor memory you specify. For example, if worker memory is 24 GB, and you want to run 2 executors per worker, you cannot specify executor memory to be more than 12 GB.

  3. A worker's memory can be limited when starting a slave by specifing the value for optional parameter--memory or by changing the value of SPARK_WORKER_MEMORY. Same with the number of cores (--cores/SPARK_WORKER_CORES).

If you want to be able to run multiple jobs on the Standalone Spark cluster, you could use the spark.cores.max configuration property while doing spark-submit. For example, like this.

spark-submit <other parameters> --conf="spark.cores.max=16" <other parameters>

So, if your Standalone Spark Cluster allows 64 cores in total, and you give only 16 cores to your program, other Spark jobs could use the remaining 48 cores.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to allocate more executors per worker in Standalone cluster mode?

How are Spark Executors launched if Spark (on YARN) is not installed on the worker nodes?

SPARK standalone cluster: Executors exit, how to track the source of the error?

What are workers, executors, cores in Spark Standalone cluster?

Spark standalone configuration having multiple executors

EMR Spark job using less executors than nodes in the cluster

How does spark choose nodes to run executors?(spark on yarn)

Spark - How to run a standalone cluster locally

How to send Spark metrics to Graphite on Standalone cluster?

How can a variable declared in scala can be used by spark executors in cluster

How SPARK_WORKER_CORES setting impacts concurrency in Spark Standalone

Spark with yarn-client on HDP multi nodes cluster only starts executors on the same single node

Is it possible for multiple Executors to be launched within a single Spark worker for one Spark Application?

Authentication for Spark standalone cluster

Spark Standalone Number Executors/Cores Control

Error when submitting multiple spark applications to standalone cluster

How to change the number of CPUs each worker uses in Spark Standalone mode?

Spark standalone connection driver to worker

Why is Spark Standalone only creating one Executor when I have two Worker nodes?

How to fully utilize all Spark nodes in cluster?

Kubernetes Cluster master/ Worker Nodes

How to set path to files in Apache Spark Standalone Cluster?

How can I see the aggregated logs for a Spark standalone cluster

How to install Apache Zeppelin on existing Apache Spark standalone cluster

Spark Standalone how to pass local .jar file to cluster

Spark executors fails to run on kubernetes cluster

Standalone spark cluster Authorization with Ranger

Submitting Spark application on standalone cluster

how to pass --packages to spark-submit in Kubernetes managed cluster?