How does spark choose nodes to run executors?(spark on yarn)

MengQi

How does spark choose nodes to run executors?(spark on yarn) We use spark on yarn mode, with a cluster of 120 nodes. Yesterday one spark job create 200 executors, while 11 executors on node1, 10 executors on node2, and other executors distributed equally on the other nodes.

Since there are so many executors on node1 and node2, the job run slowly.

How does spark select the node to run executors? according to yarn resourceManager?

Yehor Krivokon

Cluster Manager allocates resources across the other applications. I think the issue is with bad optimized configuration. You need to configure Spark on the Dynamic Allocation. In this case Spark will analyze cluster resources and add changes to optimize work.

You can find all information about Spark resource allocation and how to configure it here: http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How does Spark running on YARN account for Python memory usage?

Spark yarn cluster vs client - how to choose which one to use?

How does Apache Spark handles system failure when deployed in YARN?

Resources/Documentation on how does the failover process work for the Spark Driver (and its YARN Container) in yarn-cluster mode

Spark not able to run in yarn cluster mode

How does Apache Spark know about HDFS data nodes?

How does Spark on Yarn store shuffled files?

Why does Yarn on EMR not allocate all nodes to running Spark jobs?

How Kafka choose the follower nodes for replications?

How to connect Apache Spark with Yarn from the SparkContext?

Property spark.yarn.jars - how to deal with it?

How are Spark Executors launched if Spark (on YARN) is not installed on the worker nodes?

spark on yarn run double times when error

setup/run spark (spark-shell) on yarn client mode

how does YARN "Fair Scheduler" work with spark-submit configuration parameter

How to solve yarn container sizing issue on spark?

How does Spark prepare executors on Hadoop YARN?

How to set YARN queue for spark-shell?

How to add multiple nodes to Hadoop 2.2.0 Yarn?

Spark and YARN. How does the SparkPi example use the first argument as the master url?

With how many spark nodes should I use Mesos or Yarn?

Can not understand how Spark let python run at Yarn? How does the ProcessBuilder deal with zip file?

How to run simple Spark app with ZeroMQ on a YARN cluster?

Can Spark streaming and Spark applications be run within the same YARN cluster?

Why do only few nodes work in apache spark on yarn?

How do I run Spark 2.2 on YARN and HDP?

yarn berry run how to run installed packages

How does Spark run on a single machine?

Run threadpool on multiple spark nodes?