J'ai un pyspark.sql.dataframe.DataFrame avec 1300 lignes et 5 colonnes. J'utilise ce qui suit pour exporter la trame de données vers C:/temp :
c5.toPandas().to_csv("C:/temp/colspark.csv")
Mais j'obtiens l'erreur suivante :
<ipython-input-4-2c57938dba1e> in <module>
----> 1 c5.toPandas().to_csv("C:/temp/colspark.csv")
S:\tdv\ab\ecp\Spark\spark\spark-2.4.4-bin-hadoop2.7\python\pyspark\sql\dataframe.py in toPandas(self)
2141
2142 # Below is toPandas without Arrow optimization.
(...)
Py4JJavaError: An error occurred while calling o689.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 50.0 failed 1 times, most recent failure: Lost task 0.0 in stage 50.0 (TID 2190, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last)
What I have tried so far:
``spark.conf.set("spark.sql.execution.arrow.enabled", "true")``
But:
``Py4JJavaError Traceback (most recent call last)
<ipython-input-5-92bc22b46531> in <module>
1 spark.conf.set("spark.sql.execution.arrow.enabled", "true")
----> 2 c5.toPandas().to_csv("C:/temp/colspark.csv")
S:\tdv\ab\ecp\Spark\spark-2.4.4-bin-hadoop2.7\python\pyspark\sql\dataframe.py in toPandas(self)
2120 _check_dataframe_localize_timestamps
2121 import pyarrow
-> 2122 batches = self._collectAsArrow()
2123 if len(batches) > 0:
2124 table = pyarrow.Table.from_batches(batches)
S:\tdv\ab\ecp\Spark\spark-2.4.4-bin-hadoop2.7\python\pyspark\sql\dataframe.py in _collectAsArrow(self)
2182 return list(_load_from_socket((port, auth_secret), ArrowStreamSerializer()))
2183 finally:
-> 2184 jsocket_auth_server.getResult() # Join serving thread and raise any exceptions````
I even followed some solutions from
https://stackoverflow.com/questions/31937958/how-to-export-data-from-spark-sql-to-csv
But I cannot figure out how to proceed anymore. Is there any way to avoid arrow optimisation? Or I have to use another method to save the CSV file?
Je comprends que vous essayez d'enregistrer le cadre de données Spark dans un fichier csv dans votre répertoire local. SI oui, écrivez comme ci-dessous :
dfname.write.csv("c:\\temp\\csvfoldername")
Cet article est collecté sur Internet, veuillez indiquer la source lors de la réimpression.
En cas d'infraction, veuillez [email protected] Supprimer.
laisse moi dire quelques mots