How to Import Large Amount of Data to Cloud Firestore

firebase

firebase

Firestore has been out of beta for some time now. But it still doesn’t support import or export functionality natively.

This is a big downer for someone wanting to use the native functionality of the Firebase platform.

Keep in mind that when I talk about enabling exports/imports, I’m talking about the firebase-admin SDK. Export/Import via the gsutil command-line utility is available, but that is a manual process and won’t scale very well with an increase in data size.

In this article, I’ll demonstrate how you can use the Firebase Node.js Admin SDK to import a large amount of data in your Firestore database.

Considerations

  • Firebase Admin SDK (loosely) restricts writes to 1 per second in the documents.
  • Firestore Batch limits 500 operations in a single batch. This means that we will need to use multiple batches for large amount of data.
  • Large dataset (csv) might cause JavaScript heap out of memory exceptions.

Instructions

Set up the firebase project with the admin SDK.

firebase init --only functions
npm install --save csv-parser

In the index.js file, add the following code.

const admin = require("firebase-admin");
const functions = require("firebase-functions");
const csvParser = require("csv-parser");
const fs = require("fs");
const serviceAccountKey = require("./path/to/key");
const csvFilePath = "/path/to/csv";

admin.initializeApp({
  credential: admin.credential.cert(serviceAccountKey),
});

const firestore = admin.firestore();

const handleError = (error) => {
  // Do something with the error...
};

const commitMultiple = (batchFactories) => {
  let result = Promise.resolve();
  /** Waiting 1.2 seconds between writes */
  const TIMEOUT = 1200;

  batchFactories.forEach((promiseFactory, index) => {
    result = result
      .then(() => {
        return new Promise((resolve) => {
          setTimeout(resolve, TIMEOUT);
        });
      })
      .then(promiseFactory)
      .then(() =>
        console.log(`Commited ${index + 1} of ${batchFactories.length}`)
      );
  });

  return result;
};

const api = (req, res) => {
  let currentBatchIndex = 0;
  const batchesArray = [];
  let batchDocsCount = 0;

  batchFactoriesArray.push();

  return Promise.resolve()
    .then(() => {
      const data = [];

      return fs
        .createReadStream(csvFilePath)
        .pipe(csvParser())
        .on("data", (row) => {
          const batch = (() => {
            const batchPart = db.batch();

            if (batchesArray.length === 0) {
              batchesArray.push(batchPart);
            }

            if ((batchDocsCount = 499)) {
              // reset count
              batchDocsCount = 0;
              batchesArray.push(batchPart);

              currentBatchIndex++;
              batchFactories.push(() => batchPart.commit());
            }

            return batchesArray[currentBatchIndex];
          })();

          batchDocsCount++;

          const ref = firestore.collection("my_data").doc();

          batch.set(ref, JSON.parse(row));
        })
        .on("end", resolve);
    })
    .then(() => commitMultiple(batchFactories))
    .then(() => res.json({ done: true }))
    .catch(handleError);
};

module.exports = {
  api: functions
    .runWith({
      timeoutSecontds: 60 * 9, // seconds,
    })
    .https.onRequest(api),
};

That is the full code which will import all the data that you would want in Firestore using the Admin SDK.

Now, let us check step-by-step how this code works.

module.exports = {
  api: functions
    .runWith({
      timeoutSeconds: 60 * 9, // seconds,
    })
    .https.onRequest(api),
};

The module.exports enables a https cloud function named /api.

const admin = require("firebase-admin");
const functions = require("firebase-functions");
const csvParser = require("csv-parser");
const fs = require("fs");
const serviceAccountKey = require("./path/to/key");
const csvFilePath = "./path/to/csv";

admin.initializeApp({
  credential: admin.credential.cert(serviceAccountKey),
});

const firestore = admin.firestore();

This is the setup code which imports the firebase-admin, firebase-functions, and the csv-parser libraries that we use to run our cloud function.

The serviceAccountKey is a json file which contains authentication and setup information about your Firebase project.

return fs
  .createReadStream(csvFilePath)
  .pipe(csvParser())
  .on("data", (row) => {
    const batch = (() => {
      const batchPart = db.batch();

      if (batchesArray.length === 0) {
        batchesArray.push(batchPart);
      }

      if ((batchDocsCount = 499)) {
        // reset count
        batchDocsCount = 0;
        batchesArray.push(batchPart);

        currentBatchIndex++;
        batchFactories.push(() => batchPart.commit());
      }

      return batchesArray[currentBatchIndex];
    })();

    batchDocsCount++;
    const ref = firestore.collection("my_data").doc();

    batch.set(ref, JSON.parse(row));
  })
  .on("end", () => Promise.resolve());

A thing that you should note is that cloud functions are not meant for long-running operations. The max timeout allowed for a https function instance is 540 seconds (9 minutes). If your instance reaches this time, the function will end abruptly causing an inconsistent state for your database.

If you are sure that the cloud function instance will definitely take more than 9 minutes to complete its work, a better approach is to use a custom server.

The default timeout for https functions is 60 seconds. Importing your data might take a little more than that.

You can adjust this by providing timeoutSeconds param in the runWith method of functions.

module.exports = {
  api: functions
    /** Value can be anything from 60 to 540 */
    .runWith({ timeoutSeconds: 60 * 9 })
    .https.onRequest(api),
};

Now, this part is the meat of the whole thing. What we are here trying to do is to read the csv file bit by bit. Not the whole thing at once, but part by part.

We could have used fs.readfile(), or fs.readFileSync()to read the file. But, a major downside to it is that in cases of big files, it will cause out of memory errors in the Node.js runtime.

The fs.createReadStream reads the file in chunks in a size that you specify (default is 64 kb I belive). This allows us to commit data to Firestore little by little.

This function assumes that your data is in the CSV that can be parsed by JSON.parse. Otherwise, you will need to remove the JSON.parse from the batch.set function.

In most cases, this script may not fit your particular needs, but that’s just how real life is. 🙂

,

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.