Why does the code for initializing Spark Context vary widely between different sources?

Iterator516

I know that I need to initialize Spark Context to create resilient distributed datasets (RDDs) in PySpark. However, different sources give different code for how to do so. To resolve this once and for all, what is the right code?

1) Code from Tutorials Point: https://www.tutorialspoint.com/pyspark/pyspark_sparkcontext.htm

from pyspark import SparkContext
sc = SparkContext("local", "First App")

2) Code from Apache: https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#resilient-distributed-datasets-rdds

from pyspark import SparkContext, SparkConf

Then, later down the page, there is:

conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)

These are just two examples. I can list more, but the main problem for me is the lack of uniformity for something so simple and basic. Please help and clarify.

pissall

1)

In local[N] - N is the maximum number of cores can be used in a node at any point of time. This will use your local host resources.

In cluster mode (when you specify a Master node IP) you can set --executor-cores N. It means that each executor can run a maximum of N tasks at the same time in an executor.

2)

And when you don't specify an app name, it could be left blank or spark could ne creating a random name. I am trying to get the source code for setAppName() but not able to find any meat

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Why does the source code of a website vary when visited from different browsers?

Why does this code print a different result between Windows and Linux?

Why does shutdown time vary?

Why does initializing a string in an if statement seem different than in a switch statement?

Why SharedPreferences is not initializing in this code?

Why does the compiler generate such code when initializing a volatile array?

In R, why does subsetting a negative numeric value of length 1 result in widely different results depending on what you subset it on?

why does the drive space used vary between two disk drives holding identical files

Why does the count of calls of a recursive method causing a StackOverflowError vary between program runs?

Why does the number of fragments vary significantly from one system to the other despite no changes in source code?

Why does this code behaves different at different values

Why does what the updater tell me vary?

Why does the assembly encoding of objdump vary?

Why does If block output vary for stop/warning?

Why does the location of environment variables vary that much?

Why does this C++ program return different results between Code::Blocks and an online IDE?

why does it make a different response code between using httpclient and java.net?

Why does spark-shell not start SQL context?

Why does static have different meanings depending on the context?

Why does this code have two different results?

Why does this code work for different types?

Why does this code outputs two different results?

Why does only the last NPM task in VS Code of several that only vary by label and options (and not the script) get picked up?

Why does initializing C local character arrays internally store the strings in different stack/data segments?

Difference between different ways of initializing pointers in c

Does PayPal mc_gross vary when paying with different currency

How does Spark DataFrame distinguish between different VectorUDT objects?

Why does "Open with Code" appear twice in the context menu?

Why does it show "Open with Code" on context menu on desktop?

TOP Ranking

HotTag

Archive