How to archive video files to external storage like Google Drive?

alefty · June 2, 2022, 5:30pm

Hi! Is it possible to use APIs to save a copy (a duplicate) of a video file to a separate long-term storage provider like Google Drive, pCloud, or even an AWS bucket?

I know that Frame has its own Archival storage, but I’d like to save copies of my finished video files to another storage provider as well for long-term backup purposes.

So far it seems like the answer is no, but I’d be willing to hear possible alternatives, For example, writing a script to automatically download files from Frame and then upload them to another service?

Thanks!

jhodges · June 2, 2022, 5:47pm

Heya!

Check out this article one of our Developer Relations team members wrote back in 2020.

alefty · June 2, 2022, 7:21pm

Hey! Thanks for the quick response it’s unclear to me if this will work for storage solutions other than S3. Any idea if a similar set of Lambda functions could be used to connect to Google Drive? @jhodges

jhodges · June 2, 2022, 7:26pm

Yup! It’s all gonna be custom code though. For Google drive, there is a 750 GB upload limit per day that you should keep in mind though.

alefty · June 2, 2022, 7:40pm

Gotcha! Is there a way to do it without Lambda functions? I’ve never dabbled with them so I was hoping there was a way to do it directly using Frame & Google Drive APIs

metadaddy · March 15, 2023, 6:01am

Hi @jhodges - is that article available somewhere other than Medium? It wants me to upgrade to a paid membership

jhodges · March 17, 2023, 6:11pm

Sure thing, here it is as a PDF.

metadaddy · March 28, 2023, 3:08am

Thanks, @jhodges!

I inherited some code that looks like it was inspired by that article [link removed, since “the community feels it is an advertisement”]. I realized that there’s a bottleneck in the s3Uploader function - ManagedUpload does a great job of uploading the data in multiple, parallel queues, but the data is being read by fetch() as a single stream. I’ve written a new uploader that reads ranges of data from the Frame.io URL, writing each range to a part of the S3 object, in parallel. I’ll let you know when I update the repo so you can take a look, if you’re interested.

jhodges · May 1, 2023, 8:01pm

Nice work on the improvement! I’d love to check out the code whenever you put it up.

metadaddy · May 2, 2023, 12:27am

Sure - here you go (I tried simply posting a link to the code at GitHub, but Discourse wouldn’t let me):

/*
MIT License

Copyright (c) 2022 Backblaze

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
 */

import {GetObjectCommand, S3} from "@aws-sdk/client-s3";
import {getSignedUrl} from "@aws-sdk/s3-request-presigner";
import stream from "stream";
import fetch from "node-fetch";

import {formatBytes} from "./utils.js";

class Uploader {
    // Defaults same as AWS SDK
    static defaultQueueSize = 4;
    static minPartSize = 1024 * 1024 * 5;
    // From S3/B2 specification
    static maxTotalParts = 10000;

    client;
    url;
    bucket;
    key;
    metadata;
    queueSize = Uploader.defaultQueueSize;
    partSize = Uploader.minPartSize;
    totalUploadedBytes = 0;
    totalBytes;

    constructor(options) {
        Object.assign(this, options)
        this.validatePartSize();
        this.adjustPartSize();
    }

    adjustPartSize() {
        const newPartSize = Math.ceil(this.totalBytes / Uploader.maxTotalParts);
        if (newPartSize > this.partSize) {
            this.partSize = newPartSize;
        }
        this.totalParts = Math.ceil(this.totalBytes / this.partSize);
    }

    validatePartSize() {
        if (this.partSize < Uploader.minPartSize) {
            throw new Error('partSize must be greater than ' + Uploader.minPartSize);
        }
    }

    async send() {
        console.log(`Creating multipart upload for ${this.bucket}/${this.key}`);
        console.log(`Reading ${this.totalBytes} bytes from ${this.url}`);
        const multipart = await this.client.createMultipartUpload({
            Bucket: this.bucket,
            Key: this.key,
            Metadata: this.metadata,
        });

        const lastPartSize = this.totalBytes % this.partSize;
        console.log(`Uploading ${this.totalParts - 1} parts of ${this.partSize} bytes plus 1 part of ${lastPartSize} bytes`)

        const promises = new Map();
        const completedParts = [];
        for (let partCount = 0; partCount < this.totalParts; partCount++) {
            const contentLength = (partCount < (this.totalParts - 1))
                ? this.partSize
                : lastPartSize;
            const partNumber = partCount + 1;

            const writeStream = new stream.PassThrough();
            const promise = this.client.uploadPart({
                Bucket: this.bucket,
                Key: this.key,
                Body: writeStream,
                PartNumber: partNumber,
                ContentLength: contentLength,
                UploadId: multipart['UploadId']
            }).then(response => {
                return {
                    ...response,
                    partNumber,
                    contentLength,
                };
            }, reason => {
                console.log("Uploader.send() - uploadPart() failed: ", reason)
                throw reason;
            });
            promises.set(partNumber, promise);

            const start = partCount * this.partSize;
            const end = (start + contentLength) - 1;
            fetch(this.url,{
                headers: {
                    'range': `bytes=${start}-${end}`
                }
            }).then(response => {
                if (response.status !== 206) {
                    const message = `Server for URL ${this.url} does not support range requests`;
                    console.log("Uploader.send() - " + message);
                    throw new Error(message)
                }
                response.body.pipe(writeStream);
            }, reason => {
                console.log("Uploader.send() - fetch() failed: ", reason)
                throw reason;
            });

            if (promises.size >= this.queueSize) {
                // Promise.race() returns the first *settled* promise, so if it is rejected,
                // the error is thrown from here by await. If we used Promise.any(), the error
                // would only be thrown if *all* the promises were rejected
                const part = await Promise.race(Array.from(promises.values()));
                this.completePart(completedParts, part);
                promises.delete(part.partNumber);
            }
        }

        // Wait for remaining parts to complete uploading
        const remainingParts = await Promise.all(Array.from(promises.values()));
        for (const part of remainingParts) {
            this.completePart(completedParts, part);
        }

        if (this.totalUploadedBytes !== this.totalBytes) {
            throw new Error(`Data missing - uploaded ${this.totalUploadedBytes} of ${this.totalBytes} bytes`);
        }

        console.log(`Completing multipart upload for ${this.bucket}/${this.key}`);
        return this.client.completeMultipartUpload({
            Bucket: this.bucket,
            Key: this.key,
            UploadId: multipart['UploadId'],
            MultipartUpload : {
                Parts: completedParts
            }
        }).then(_ => {
            console.log(`Completed multipart upload of ${this.totalUploadedBytes} bytes to ${this.bucket}/${this.key}`);
        }, reason => {
            console.log("Uploader.send() - completeMultipartUpload() failed: ", reason)
            throw reason;
        });
    }

    completePart(completedParts, part) {
        this.totalUploadedBytes += part.contentLength;
        console.log(`${this.key}: uploaded part ${part.partNumber}/${this.totalParts} ${formatBytes(this.totalUploadedBytes)}/${formatBytes(this.totalBytes)}`);
        completedParts[part.partNumber - 1] = {
            PartNumber: part.partNumber,
            ETag: part.ETag
        };
    }
}

export function uploadUrlToB2(options) {
    const uploader = new Uploader(options);
    return uploader.send();
}

export function getB2Connection(options) {
    return new S3({
        customUserAgent: 'b2-node-docker-0.2',
        region: options.endpoint.replaceAll(/https:\/\/s3\.(.*?)\.backblazeb2\.com/g, '$1'),
        ...options,
    });
}

export async function createB2SignedUrl(client, bucket, key, expiresIn) {
    const command = new GetObjectCommand({
        Bucket: bucket,
        Key: key,
    });
    return await getSignedUrl(client, command, { expiresIn });
}

export async function getB2ObjectSize(client, bucket, key) {
    return new Promise((resolve, reject) =>
        client.headObject({
            Bucket: bucket,
            Key: key
        }, (err, response) => {
            if (err) {
                reject(err);
            } else {
                resolve(response['ContentLength']);
            }
        })
    );
}

Although it says ‘B2’, it’s using Backblaze B2’s S3-compatible API via the AWS SDK for JavaScript v3, and there’s only one line that is actually B2-specific - parsing the region out of the endpoint URL in the getB2Connection() function. The rest is all equally applicable to Amazon S3 and any other S3-compatible object storage platform

The core of the code, in Uploader.fetch(), is quite straightforward - it simply loops through the file parts, fetching each part from the source URL via a range request and firing an asynchronous upload to B2, storing the resulting promises in a map of part numbers to Promise objects.

There’s a little logic to wait on promises completing when the number of parts in-flight matches the desired queue size, and then waiting for the remaining promises to complete once all the parts are in flight, and that’s about it.

In my informal benchmarking, with a 16 GB test file, I was getting about 11 MB/s with the original code. Running the new code on an 8 GB RAM, 4 dedicated vCPU Vultr VM, with 16 queues and a part size of 384 MB, gave me about 240 MB/s. Going up a step, to 16 GB RAM and 8 vCPUs, didn’t show much of a difference.

Topic		Replies	Views
Frameio to Google Drive Integration via Make (integromat) Automations	6	2691	June 20, 2024
GD to Frame.io via Pabbly Automations api , webhooks	0	375	November 16, 2023
Can others upload to my projects/account without a frame.io account using APIs? Help	7	5175	June 20, 2024
Google Drive to Frame via Zapier Zapier	1	1044	November 16, 2023
Getting 422 on create asset with source URL Help javascript , front-end , api	2	506	May 29, 2024

How to archive video files to external storage like Google Drive?

Related topics