Why does the code for initializing Spark Context vary widely between different sources?

Iterator516

I know that I need to initialize Spark Context to create resilient distributed datasets (RDDs) in PySpark. However, different sources give different code for how to do so. To resolve this once and for all, what is the right code?

1) Code from Tutorials Point: https://www.tutorialspoint.com/pyspark/pyspark_sparkcontext.htm

from pyspark import SparkContext
sc = SparkContext("local", "First App")

2) Code from Apache: https://spark.apache.org/docs/2.2.0/rdd-programming-guide.html#resilient-distributed-datasets-rdds

from pyspark import SparkContext, SparkConf

Then, later down the page, there is:

conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)

These are just two examples. I can list more, but the main problem for me is the lack of uniformity for something so simple and basic. Please help and clarify.

pissall

1)

In local[N] - N is the maximum number of cores can be used in a node at any point of time. This will use your local host resources.

In cluster mode (when you specify a Master node IP) you can set --executor-cores N. It means that each executor can run a maximum of N tasks at the same time in an executor.

2)

And when you don't specify an app name, it could be left blank or spark could ne creating a random name. I am trying to get the source code for setAppName() but not able to find any meat

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-05-22

Comments

0 comments

In R, why does subsetting a negative numeric value of length 1 result in widely different results depending on what you subset it on?

Why does only the last NPM task in VS Code of several that only vary by label and options (and not the script) get picked up?

TOP Ranking

Article

Why does the code for initializing Spark Context vary widely between different sources?

Why does the code for initializing Spark Context vary widely between different sources?

1)

2)

pump.io port in URL

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

How to import an asset in swift using Bundle.main.path() in a react-native native module

Inner Loop design for webscrapping

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' (111 Connection refused)

Removed zsh, but forgot to change shell back to bash, and now Ubuntu crashes (wsl)

Ambiguous use of 'init' with CFStringTransform and Swift 3

Resetting Value of <input type="time"> in Firefox

Execute ./script.sh with a crontab

Converting a class method to a property with a backing field

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

How to update azerothcore-wotlk docker container

How to set tab order for array of cluster,where cluster elements have different data types in LabVIEW?

Grails with Oracle thick OCI driver authenticate to Oracle with wrong user

How to pass data to the ng2-bs3-modal?

Making Array From Page Elements in jQuery

Retrieve Element Tag Value XML Using Bash

Laravel's ORM sync with timestamps doesn't update timestamps

Do animations stop css changes after animation completion?