This is a tutorial on AWS S3 Multipart Uploads with Javascript. Multipart Upload is a nifty feature introduced by AWS S3. It lets us upload a larger file to S3 in smaller, more manageable chunks. Individual pieces are then stitched together by S3 after all parts have been uploaded. The individual part uploads can even be done in parallel. If a single upload fails, it can be restarted again and we save on bandwidth.
We’re going to cover uploading a large file using the AWS JS SDK.
All Multipart Uploads must use 3 main core API’s:
- createMultipartUpload – This starts the upload process by generating a unique UploadId.
- uploadPart – This uploads the individual parts of the file.
- completeMultipartUpload – This signals to S3 that all parts have been uploaded and it can combine the parts into one file.
Let’s set up a basic nodeJs project and use these functions to upload a large file to S3. The goal is that the full file need not be uploaded as a single request thus adding fault tolerance and potential speedup due to multiple parallel uploads. This also means that a full large file need not be present in memory of the nodeJs process at any time thus reducing our memory footprint and avoiding out of memory errors when dealing with very large files.
First, we will signal to S3 that we are beginning a new multipart upload by calling createMultipartUpload. This will return a unique UploadId that we must reference in uploading individual parts in the next steps.
let multipartCreateResult = await S3.createMultipartUpload({
Bucket: process.env.BUCKET,
Key: "movie.mp4",
ACL: "public-read",
ContentType: "video/mp4",
StorageClass: 'STANDARD'
}).promise()
We will then read our file one part at a time in smaller chunks of 10MB and upload each part with uploadPart.
let chunkCount = 1;
let uploadPartResults = []
fs.open(filePath, 'r', function (err, fd) {
if (err) throw err;
function readNextChunk() {
fs.read(fd, buffer, 0, CHUNK_SIZE, null, async function (err, nread) {
if (err) throw err;
if (nread === 0) {
// done reading file, do any necessary finalization steps
fs.close(fd, function (err) {
if (err) throw err;
});
return;
}
var data;
if (nread < CHUNK_SIZE) {
data = buffer.slice(0, nread);
}
else {
data = buffer;
}
let uploadPromiseResult = await S3.uploadPart({
Body: data,
Bucket: process.env.BUCKET,
Key: "movie.mp4",
PartNumber: chunkCount,
UploadId: multipartCreateResult.UploadId,
}).promise()
uploadPartResults.push({
PartNumber: chunkCount,
ETag: uploadPromiseResult.ETag
})
chunkCount++;
readNextChunk()
});
}
readNextChunk();
}
Finally, we will call completeMultipartUpload when the individual parts are uploaded. This call should contain the etags received in each uploadPart call.
let completeUploadResponce = await S3.completeMultipartUpload({
Bucket: process.env.BUCKET,
Key: "movie.mp4",
MultipartUpload: {
Parts: uploadPartResults
},
UploadId: multipartCreateResult.UploadId
}).promise()
Now, you should be able to see a single file in your S3 bucket.
In this Amazon S3 Multipart Upload example, we have read the file in chunks and uploaded each chunk one after another. But each chunk can be uploaded in parallel with something like Promise.all() or even some pooling. If a single upload fails due to a bad connection, it can be retried individually (just the 10 mb chunk, not the full file).
More functions like abortMultipartUpload and listMultipartUploads are available to make uploads resumable while being stateless. Essentially you can query S3 with what parts have been successfully uploaded and which ones are remaining.