How to submit a spark job on a remote master node in yarn client mode?

Mnemosyne

I need to submit spark apps/jobs onto a remote spark cluster. I have currently spark on my machine and the IP address of the master node as yarn-client. Btw my machine is not in the cluster. I submit my job with this command

./spark-submit --class SparkTest --deploy-mode client /home/vm/app.jar 

I have the address of my master hardcoded into my app in the form

val spark_master = spark://IP:7077

And yet all I get is the error

16/06/06 03:04:34 INFO AppClient$ClientEndpoint: Connecting to master spark://IP:7077...
16/06/06 03:04:34 WARN AppClient$ClientEndpoint: Failed to connect to master IP:7077
java.io.IOException: Failed to connect to /IP:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /IP:7077

Or instead if I use

./spark-submit --class SparkTest --master yarn --deploy-mode client /home/vm/test.jar

I get

Exception in thread "main" java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:251)
at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:228)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Do I really need to have hadoop configured as well in my workstation? All the work will be done remotely and this machine is not part of the cluster. I am using Spark 1.6.1.

Pranav Shukla

First of all, if you are setting conf.setMaster(...) from your application code, it takes highest precedence (over the --master argument). If you want to run in yarn client mode, do not use MASTER_IP:7077 in application code. You should supply hadoop client config files to your driver in the following way.

You should set environment variable HADOOP_CONF_DIR or YARN_CONF_DIR to point to the directory which contains the client configurations.

http://spark.apache.org/docs/latest/running-on-yarn.html

Depending upon which hadoop features you are using in your spark application, some of the config files will be used to lookup configuration. If you are using hive (through HiveContext in spark-sql), it will look for hive-site.xml. hdfs-site.xml will be used to lookup coordinates for NameNode reading/writing to HDFS from your job.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Submit a Spark job on a Yarn cluster from a remote client

Spark - How many executors for application master in Yarn client mode

How to submit Flink job to a remote YARN cluster?

how to spark submit job to yarn on other cluster?

emr spark master node runs out of memory in yarn cluster mode

spark Yarn mode how to get applicationId from spark-submit

How to submit Apache Spark job to Hadoop YARN on Azure HDInsight

spark submit - An existing connection was forcibly closed by the remote host [on master node ]

How ApplicationMaster in Client mode in Spark-yarn works?

spark (yarn) remote app submit bash

Spark-submit yarn;client mode, SQLContext.sql returns database not found

How to submit jobs to spark using yarn rest api? I want to use YARN REST API for submitting job to spark

How to submit a spark job in a 4 node CDH cluster

Spark job with explicit setMaster("local"), passed to spark-submit with YARN

Spark client mode - YARN allocates a container for driver?

failing to connect to spark driver when submitting job to spark in yarn mode

How to submit a Scala job to Spark?

How to fetch Spark Streaming job statistics using REST calls when running in yarn-cluster mode

Spark Job running on Yarn Cluster java.io.FileNotFoundException: File does not exits , eventhough the file exits on the master node

Can't find spark-submit finished job in Yarn

setup/run spark (spark-shell) on yarn client mode

How to know what is the reason for ClosedChannelExceptions with spark-shell in YARN client mode?

How to prevent Spark Executors from getting Lost when using YARN client mode?

Is there a way to submit spark job on different server running master

Spark/k8s: How to run spark submit on Kubernetes with client mode

Tuning Spark job in Yarn

Find the yarn ApplicationID of of the current Spark job from the DRIVER node?

For spark applications running on YARN, which deploy mode is better - client or cluster

How to submit jobs to spark master running locally