Can I change the filename generated by aws glue job?

Leandro Published at Dev

Leandro

I'm not finding a way of changing the file name generated by glue jobs. It creates files called like 'run-xxxxx' but I want to modify that and use a specific name. Is this possible? PD: I'm using Python script (not scala)

botchniaque

Spark (and all other tools Hadoop ecosystem) use filenames as a mean to parallelise reads and writes; a spark job will produce as many files in a folder as there are partitions in it's RDD/Dataframe (often named part-XXX. When pointing Spark to a new datasource (be it S3, local FS or HDFS), you always point to a folder containing all the part-xxx files.

I don't know what kind of tool you're using, but if it depends on a filename convention then you'll have to rename your files (using your FS client) after the spark session has finished (it can be done in the Driver's code). Be aware that spark may (and usually does) produce multiple files. You can overcome that by calling coalesc on your DataFrame/RDD.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-1

Comments

0 comments

TOP Ranking

Article

Can I change the filename generated by aws glue job?

Can I change the filename generated by aws glue job?

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

Emulator wrong screen resolution in Android Studio 1.3

3D Touch Peek Swipe Like Mail

Double spacing in rmarkdown pdf

Svchost high CPU from Microsoft.BingWeather app errors

How to how increase/decrease compared to adjacent cell

Using Response.Redirect with Friendly URLS in ASP.NET

java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

BigQuery - concatenate ignoring NULL

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Can a 32-bit antivirus program protect you from 64-bit threats

Make a B+ Tree concurrent thread safe

Bootstrap 5 Static Modal Still Closes when I Click Outside

Vector input in shiny R and then use it

Assembly definition can't resolve namespaces from external packages