Spark with yarn-client on HDP multi nodes cluster only starts executors on the same single node

tricky

I have installed a multi nodes HDP Cluster with Spark and Yarn on EC2

Every nodes are DataNodes.

Node3 is the only Spark Client node.

Every time I run spark jobs with yarn-client or yarn-cluster mode, it always initializes spark executors on the node3. Whereas I want the job to use every nodes.

What configs am I missing ?

I set MASTER="yarn-client" in ambari for example, but this doesn't solve the problem.

Thanks for your help.

EDIT : When I run a spark shell with 30 executors, it starts 12 executors on node3 and it takes 95% of the cluster. So my guess is that node1 and node2 aren't taken into account by yarn cluster for allocating resources like spark containers/executors.

Dunno which conf should I modify to add node1 and node2 into the cluster resources

tricky

Okey I was really dumb.

I had to add every node as Yarn NodeManager. With this, my spark jobs are well distributed on every nodes of the cluster.

Sorry this was dumb

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

apache storm - run jar in single node correctly but not in multi nodes

Does h2o in a single node cluster do parallel processing or is it only in multi node cluster that parallel processing kicks in?

Spark yarn cluster vs client - how to choose which one to use?

What are workers, executors, cores in Spark Standalone cluster?

How to submit a spark job on a remote master node in yarn client mode?

Why does spark-submit in YARN cluster mode not find python packages on executors?

For spark applications running on YARN, which deploy mode is better - client or cluster

How are Spark Executors launched if Spark (on YARN) is not installed on the worker nodes?

How to setup Spark with a multi node Cassandra cluster?

How does Spark prepare executors on Hadoop YARN?

Submit a Spark job on a Yarn cluster from a remote client

How does spark choose nodes to run executors?(spark on yarn)

EMR Spark job using less executors than nodes in the cluster

What "module" takes care of assigning partitions to specific nodes, YARN/cluster manager or Spark itself?

How multiple executors are managed on the worker nodes with a Spark standalone cluster?

Akka cluster with one master node, worker nodes and non cluster client nodes

Are the master and worker nodes the same node in case of a single node cluster?

Spark executors fails to run on kubernetes cluster

flink 1.12.1 example application failing on a single node yarn cluster

Merge nodes of the same kind to a single node

How to prevent Spark Executors from getting Lost when using YARN client mode?

Spark on YARN resource manager: Relation between YARN Containers and Spark Executors

Can Spark streaming and Spark applications be run within the same YARN cluster?

Why do only few nodes work in apache spark on yarn?

Yarn Cluster optimization for Spark

Some YARN worker node not join cluster , while I create spark cluster on Dataproc

Do nodes in Spark Cluster share the same storage?

Spark - How many executors for application master in Yarn client mode

emr spark master node runs out of memory in yarn cluster mode