Automate FTP of files (csv) to Amazon S3 bucket

priya

I want to automate the data uploading/ingestion to Amazon S3 bucket. I don't want to use softwares like Filezilla for FTP to S3.

The files will be available to FTP server on daily basis. I want to pick those files from FTP server and store in Amazon S3 on daily basis. Can i set up cron jobs or scripts to run in AWS in a cost-effective manner. What AWS instances can help me in achieving this.

The files are approx 1GB of size.

John Rotenstein

Amazon S3 is an object storage service. It cannot "pull" data from an external location.

Therefore, you will need a script or program that will:

  • Retrieve the data from the FTP server, and
  • Upload the data to Amazon S3

It would be best to run such a script from the FTP server itself, so that the data can be sent to S3 without having to download from the FTP server first. If this is not possible, then you could run the script on any computer on the Internet, such as your own computer or an Amazon EC2 instance.

The simplest way to upload to Amazon S3 is to use the AWS Command-Line Interface (CLI). It has a aws s3 cp command to copy files, or depending upon what needs to be copied it might be easier to use the aws s3 sync command that automatically copies new or modified files.

The script could be triggered via a schedule (cron on Linux or a Scheduled Task on Windows).

If you are using an Amazon EC2 instance, you could save money by turning off the instance when it is not required. The flow could be:

  • Create an Amazon CloudWatch Event rule that triggers an AWS Lambda function
  • The AWS Lambda function can call StartInstances() to start a stopped EC2 instance
  • The Amazon EC2 instance can use a startup script (see details below) that will run your process
  • At the end of the process, tell the operating system to shutdown (sudo shutdown now -h)

This might seem like a lot of steps, but the CloudWatch Event and Lambda function are trivial to configure.

To execute a script every time a Linux instance starts, put it in: /var/lib/cloud/scripts/per-boot/

See also: Auto-Stop EC2 instances when they finish a task - DEV Community

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to process multiple CSV files from an Amazon S3 bucket in a lambda function?

FTP/SFTP access to an Amazon S3 Bucket

using boto to upload csv file into Amazon S3 bucket

How to safely share files via amazon S3 bucket

Amazon S3: move files out of a bucket and into Glacier

Allow a group to upload files on an Amazon S3 bucket

AWS Sync files with amazon s3 bucket

Amazon S3 managing files in versioned bucket

Laravel secure Amazon s3 bucket files

Lots of files appearing in my Amazon S3 bucket

Zipping and downloading Amazon S3 bucket files and folders in Laravel

AWS Lambda: How to read CSV files in S3 bucket then upload it to another S3 bucket?

What is my amazon s3 bucket url? How do I upload many many files to s3 bucket?

Is there a sample implementation of Kiba ETL Job using s3 bucket with csv files as source and the destination is in s3 bucket also?

How to get all the files from a folder in the Amazon S3 bucket?

Delete files inside a subfolder residing inside a bucket on amazon s3

Mule ESB: How to take all the files in a folder inside Bucket of Amazon S3 ( get object content)

How to zip files in Amazon s3 Bucket and get its URL

Uploading files to Amazon S3 bucket from different user accounts

Uploading files to Amazon s3 bucket using ARN iam in Python

AWS: Reading all files in an Amazon S3 bucket with a lambda function

Make a bucket public in Amazon S3

Amazon S3 bucket naming convention

Multiple subdomains for Amazon S3 Bucket?

CNAME to s3 bucket amazon

URL for public Amazon S3 bucket

Subdomain point to Amazon S3 Bucket

Installing MeteorJS in Amazon S3 Bucket

Favicon for Amazon s3 bucket