I have installed a multi nodes HDP Cluster with Spark and Yarn on EC2
Every nodes are DataNodes.
Node3 is the only Spark Client node.
Every time I run spark jobs with yarn-client or yarn-cluster mode, it always initializes spark executors on the node3. Whereas I want the job to use every nodes.
What configs am I missing ?
I set MASTER="yarn-client" in ambari for example, but this doesn't solve the problem.
Thanks for your help.
EDIT : When I run a spark shell with 30 executors, it starts 12 executors on node3 and it takes 95% of the cluster. So my guess is that node1 and node2 aren't taken into account by yarn cluster for allocating resources like spark containers/executors.
Dunno which conf should I modify to add node1 and node2 into the cluster resources
Okey I was really dumb.
I had to add every node as Yarn NodeManager. With this, my spark jobs are well distributed on every nodes of the cluster.
Sorry this was dumb
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments