Copy files in AWS S3 to an S3 bucket in another AWS account with Node.js
There are a lot of tools available for for working with S3 buckets and their contents. Just a few popular ones are ...
- Amazon S3 Tools
- AWS Management Console
- Panic's Transmit App for Mac OS X
- AWS CLI for S3
- S3Browser
- Bucket Explorer
But if you ever run into a situation where it seems like none of the above are a particularly good fit for what you need to accomplish, for example, something like copying files across buckets in different AWS accounts, then you might try taking a look at this quick and dirty Node.js code.
Github Gist -> https://gist.github.com/raffi-minassian/a1caf2c224dec9318d30 .
What it does.
It gets the contents of a bucket, 100 keys at a time. It iterates over the current batch of keys and for each key it downloads the file to a local directory named downloads and then immediately adds it to an upload queue. The upload queue uploads each file given to it and then deletes the local file when done.
How it does this.
It uses Caolan McMahon's awesome Async library and the AWS Javascript SDK for Node.js. The AWS SDK provides a way to get information about bucket contents, and also download and upload files. The Async library provides flow control. The script uses Async's eachLimit to process the array of items in the bucket returned by the AWS SDK. You can control the number of items to be processed in parallel. The script uses Async's queue to manage uploads. You can control the concurrency of the queue.
Why it does this.
The script tries to keep the program flow simple. The basic flow is...
Repeat the following steps for every entry in the source bucket[s]...
- Get the key and other info about a bucket entry
- Download to a local tmp file
- Apply logic or processing on the local tmp file if needed
- Upload the file to the destination bucket
- Delete the local tmp file.
This makes it easy to edit any custom logic you want into the script. For example, adding a few lines of code to copy only files of a certain type and size is really quick and easy.
With a little more code you can pretty easily move items from a bucket into separate destination buckets by file type for example.
Perhaps you have a bucket of images which you want to copy to another bucket after applying a resize operation to each file?
Perhaps your buckets are spread across different AWS accounts?
This skeleton script will save you some time if you need to whip up some Node.js code for handling these types of situations.
Things to keep in mind
1. If your max download bandwidth available is much higher than your max upload bandwidth, the buffer of local tmp files will grow quickly so you will want to make sure you have enough disk space.
2. You can adjust the concurrency on the upload and download sides of the flow.
3. The script expects there to be a folder named downloads in the directory you are going to run the script from. It doesn't make it for you so if you don't create it first, you will get an error complaining about its absence.
4. This method does involve downloading bucket content and then re-uploading it. If there are any tools that can do what you need working completely on the S3 side, you should probably prefer it. Unfortunately, its not always possible to accomplish things that way. Also, be sure to review AWS charges which may apply for in/out transfer to and from buckets.
5. Don't forget to replace keys and bucket names in the script.
6. Don't forget to install the scripts dependencies in the directory you will run from which are ...
* npm install aws-sdk
* npm install async
* npm install node-uuid
Best way to use
You will probably find that the best way to use the script is to spin up an instance on EC2, install Node.js, then create a directory to run from, install the dependencies and create the downloads folder in the run directory as explained above, then copy the script there and run it with node ./name-of-script-file.js .
Hopefully you will find this useful as a starting point for working with files in S3 with Node.js.