AWS Glue Job writes Null to Redshift

Princewill Ezeidei

I have multiple JSON files in an s3 bucket folder each of the files has the same pattern as the below sample an array/list of JSON objects.

file1

[{"coinRank":1,"coinId":"bitcoin","coinName":"Bitcoin","coinSymbol":"BTC","coinLoc":"bitcoin","coinPrice":53501.08,"coin1hrChange":-0.6,"coin24hrChange":-6.0,"coin7dChange":-9.2,"coin24hrVol":38266934579,"coinMarketCap":1012650219321,"fetchTime":"2021-12-03 23:55:42.654921","rankDate":"2021-12-03","rate":409.98,"coinPriceNaira":21934372.7784000002},{"coinRank":2,"coinId":"ethereum","coinName":"Ethereum","coinSymbol":"ETH","coinLoc":"ethereum","coinPrice":4225.28,"coin1hrChange":-0.3,"coin24hrChange":-7.2,"coin7dChange":-6.4,"coin24hrVol":27395766224,"coinMarketCap":502376237337,"fetchTime":"2021-12-03 23:55:42.655698","rankDate":"2021-12-03","rate":409.98,"coinPriceNaira":1732280.2944},{"coinRank":3,"coinId":"binancecoin","coinName":"Binance Coin","coinSymbol":"BNB","coinLoc":"binance-coin","coinPrice":593.95,"coin1hrChange":-0.7,"coin24hrChange":-4.9,"coin7dChange":-6.9,"coin24hrVol":2379210538,"coinMarketCap":100022794436,"fetchTime":"2021-12-03 23:55:42.656393","rankDate":"2021-12-03","rate":409.98,"coinPriceNaira":243507.621}]

file2

[{"coinRank":1,"coinId":"bitcoin","coinName":"Bitcoin","coinSymbol":"BTC","coinLoc":"bitcoin","coinPrice":52936.1,"coin1hrChange":-1.5,"coin24hrChange":-6.5,"coin7dChange":-1.7,"coin24hrVol":38241025550,"coinMarketCap":998999157967,"fetchTime":"2021-12-04 02:33:23.182164","rankDate":"2021-12-04","rate":409.98,"coinPriceNaira":21702742.2780000009},{"coinRank":2,"coinId":"ethereum","coinName":"Ethereum","coinSymbol":"ETH","coinLoc":"ethereum","coinPrice":4159.85,"coin1hrChange":-1.4,"coin24hrChange":-8.1,"coin7dChange":2.8,"coin24hrVol":28661534477,"coinMarketCap":493429600914,"fetchTime":"2021-12-04 02:33:23.182785","rankDate":"2021-12-04","rate":409.98,"coinPriceNaira":1705455.3030000003},{"coinRank":3,"coinId":"binancecoin","coinName":"Binance Coin","coinSymbol":"BNB","coinLoc":"binance-coin","coinPrice":582.32,"coin1hrChange":-1.9,"coin24hrChange":-5.4,"coin7dChange":-0.6,"coin24hrVol":1059743631,"coinMarketCap":97824378011,"fetchTime":"2021-12-04 02:33:23.183415","rankDate":"2021-12-04","rate":409.98,"coinPriceNaira":238739.5536}]

file3

[{"coinRank":1,"coinId":"bitcoin","coinName":"Bitcoin","coinSymbol":"BTC","coinLoc":"bitcoin","coinPrice":49375.27,"coin1hrChange":-0.7,"coin24hrChange":4.3,"coin7dChange":-9.5,"coin24hrVol":35860857801.0,"coinMarketCap":932932346783,"fetchTime":"2021-12-05 14:34:49.339803","rankDate":"2021-12-05","rate":410.764648,"coinPriceNaira":20281615.4014549591},{"coinRank":2,"coinId":"ethereum","coinName":"Ethereum","coinSymbol":"ETH","coinLoc":"ethereum","coinPrice":4218.99,"coin1hrChange":-0.7,"coin24hrChange":7.1,"coin7dChange":3.3,"coin24hrVol":27778808883.0,"coinMarketCap":500688046117,"fetchTime":"2021-12-05 14:34:49.340495","rankDate":"2021-12-05","rate":410.764648,"coinPriceNaira":1733011.9422655201},{"coinRank":3,"coinId":"binancecoin","coinName":"Binance Coin","coinSymbol":"BNB","coinLoc":"binance-coin","coinPrice":574.23,"coin1hrChange":-0.5,"coin24hrChange":5.2,"coin7dChange":-4.0,"coin24hrVol":2265817636.0,"coinMarketCap":96576091895,"fetchTime":"2021-12-05 14:34:49.341177","rankDate":"2021-12-05","rate":410.764648,"coinPriceNaira":235873.38382104}]

Using AWS Glue Crawler and classifier for separating JSON Objects $[*] I have split the records, and I can confirm the number of records in the Data Catalog matches the number of records in the files. However, when I push the data to redshift, I have some columns showing up as null. I can also share my glue script if necessary.

Princewill Ezeidei

I figured out what the problem was with the Dataset, The DataFrame had inferred different datatypes int64 and float64 on the columns, and when Glue created the table in Redshift, it created the number columns as double precision (float64) hence, the records that were integers were not cast properly on Redshift.

I manually specified the column types in Pandas DataFrame using the .astype() function
I dropped the table in redshift, deleted the table also in the data catalog database
Re-crawled the database and re-ran the job.

Now every data point shows up well on redshift.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-12-9

Comments

0 comments

TOP Ranking

Article

AWS Glue Job writes Null to Redshift

AWS Glue Job writes Null to Redshift

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

How to import an asset in swift using Bundle.main.path() in a react-native native module

How to use HttpClient with ANY ssl cert, no matter how "bad" it is

Modbus Python Schneider PM5300

What is the exact difference between “ use_all_dns_ips” and "resolve_canonical_bootstrap_servers_only” in client.dns.lookup options?

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

BigQuery - concatenate ignoring NULL

split column by delimiter and deleting expanded column

Unable to use switch toggle for dark mode in material-ui

Soundcloud API Authentication | NodeWebkit, redirect uri and local file system

Apache rewrite or susbstitute rule for bugzilla HTTP 301 redirect

Is there an option for a Simulink Scope to display the layout in single column?

UWP access denied

Center buttons and brand in Bootstrap

express js can't redirect user

Make a B+ Tree concurrent thread safe

Printing Int array and String array in one

Google Chrome Translate Page Does Not Work

Elasticsearch - How to match number range in string