AWS Data Glue ETL Filter Extract Input Based on Job Parameter

Patrick Bray

New to AWS Glue ETL processing and trying to implement a Job to extract data from an RDS MySQL DB for a specific customer, perform some transformations and write the results to S3.

What is the best approach to filter the data input selected from the source table can this be done as part of the source extract or does this need to be a separate Filter Transformation based on a specific key?

If implementing this as a Filter Transformation is there a way to make this dynamic based on Job input parameters? Ideally this job will be triggered by an event as part of a user initiated workflow.

Any help would be much appreciated. TIA

Robert Kossendey

What is the best approach to filter the data input selected from the source table can this be done as part of the source extract or does this need to be a separate Filter Transformation based on a specific key?

Glue is basically managed Spark. Spark has a technique called PushDownPredicate which optimises filter operations. It is very likely that Spark will push the filter operation directly into the read operation, by modifying the read statement.

You can check if that is happening in your case by converting the Glue DynamicFrame into a native Spark DataFrame with the .toDF() method and the calling the explain operation on that DataFrame.

If implementing this as a Filter Transformation is there a way to make this dynamic based on Job input parameters? Ideally this job will be triggered by an event as part of a user initiated workflow.

Yes you can, but not through the Visual UI of Glue Studio, you would need to modify the ETL script manually.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-08-2

Comments

0 comments

TOP Ranking

Article

AWS Data Glue ETL Filter Extract Input Based on Job Parameter

AWS Data Glue ETL Filter Extract Input Based on Job Parameter

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

Emulator wrong screen resolution in Android Studio 1.3

3D Touch Peek Swipe Like Mail

Double spacing in rmarkdown pdf

Svchost high CPU from Microsoft.BingWeather app errors

How to how increase/decrease compared to adjacent cell

Using Response.Redirect with Friendly URLS in ASP.NET

java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

BigQuery - concatenate ignoring NULL

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Can a 32-bit antivirus program protect you from 64-bit threats

Make a B+ Tree concurrent thread safe

Bootstrap 5 Static Modal Still Closes when I Click Outside

Vector input in shiny R and then use it

Assembly definition can't resolve namespaces from external packages