Logo
Published on

Merging Multiple CSV Files Using Node.js and TypeScript

Let's start with an interactive widget that allows us to merge multiple CSV files.

CSV Merger

Drag & drop your CSV files here, or

Often, we need to merge multiple CSV files, especially when dealing with large datasets from various sources.

This blog post will guide you through creating a robust Node.js function using TypeScript to merge multiple CSV files. We'll even handle cases where the CSV files have different columns!

Table of Contents

Understanding the Challenge

When merging CSV files, we often encounter two main scenarios:

  1. CSV files with identical columns
  2. CSV files with different columns

Our goal is to create a function that can handle both cases seamlessly, producing a merged CSV file that includes all unique columns from the input files.

Setting Up the Project

Before we dive into the code, let's set up our Node.js project with TypeScript:

  1. Create a new directory for your project:
mkdir csv-merger
cd csv-merger
  1. Initialize a new Node.js project:
npm init -y
  1. Install necessary dependencies:
npm install typescript ts-node @types/node
npm install --save-dev @types/csv-parse @types/csv-stringify
  1. Create a tsconfig.json file:
{
  "compilerOptions": {
    "target": "es6",
    "module": "commonjs",
    "outDir": "./dist",
    "strict": true,
    "esModuleInterop": true
  }
}

Creating the CSV Merger Function

Now, let's create our CSV merger function. We'll break it down into smaller, manageable parts:

1. Reading CSV Files

First, we need a function to read CSV files:

import { createReadStream } from 'fs'
import { parse } from 'csv-parse'

async function readCSV(filePath: string): Promise<string[][]> {
  return new Promise((resolve, reject) => {
    const rows: string[][] = []
    createReadStream(filePath)
      .pipe(parse())
      .on('data', (row: string[]) => rows.push(row))
      .on('end', () => resolve(rows))
      .on('error', reject)
  })
}

This function reads a CSV file and returns its contents as a 2D array.

2. Merging CSV Data

Next, we'll create the core function to merge multiple CSV files:

function mergeCSVs(csvDataArray: string[][][]): string[][] {
  if (csvDataArray.length === 0) return []

  // Collect all unique headers
  const allHeaders = new Set<string>()
  csvDataArray.forEach((csv) => {
    csv[0].forEach((header) => allHeaders.add(header))
  })
  const mergedHeaders = Array.from(allHeaders)

  // Create a map of header to index for quick lookup
  const headerIndex = new Map(mergedHeaders.map((header, index) => [header, index]))

  const mergedData: string[][] = [mergedHeaders]

  csvDataArray.forEach((csv) => {
    const csvHeaderIndex = new Map(csv[0].map((header, index) => [header, index]))

    // Start from 1 to skip the header row
    for (let i = 1; i < csv.length; i++) {
      const row = csv[i]
      const mergedRow = new Array(mergedHeaders.length).fill('')

      // Fill in the data for this row
      csv[0].forEach((header, index) => {
        const mergedIndex = headerIndex.get(header)
        if (mergedIndex !== undefined) {
          mergedRow[mergedIndex] = row[index]
        }
      })

      mergedData.push(mergedRow)
    }
  })

  return mergedData
}

This function handles CSVs with different columns by:

  • Collecting all unique headers from all CSV files
  • Creating a merged header row with all unique headers
  • Mapping each CSV's data to the merged header structure, filling in empty strings for missing columns

3. Writing the Merged CSV

Finally, we need a function to write the merged data to a new CSV file:

import { createWriteStream } from 'fs'
import { stringify } from 'csv-stringify'

async function writeCSV(data: string[][], outputPath: string): Promise<void> {
  return new Promise((resolve, reject) => {
    stringify(data, (err, output) => {
      if (err) {
        reject(err)
      } else {
        createWriteStream(outputPath).write(output, resolve)
      }
    })
  })
}

This function takes the merged data and writes it to a new CSV file.

Putting It All Together

Now, let's create a main function to orchestrate the CSV merging process:

async function main() {
  const filesToMerge = ['file1.csv', 'file2.csv', 'file3.csv']
  const outputFile = 'merged.csv'

  try {
    const csvDataArray = await Promise.all(filesToMerge.map((file) => readCSV(file)))
    const mergedData = mergeCSVs(csvDataArray)
    await writeCSV(mergedData, outputFile)
    console.log(`Merged CSV saved to \${outputFile}`)
  } catch (error) {
    console.error('Error merging CSV files:', error)
  }
}

main()

This main function:

  1. Reads all input CSV files
  2. Merges the CSV data
  3. Writes the merged data to a new CSV file

This solution maintains the order of columns based on the file sequence and handles missing data gracefully.

Key takeaways:

  • The function collects all unique headers from all input files.
  • It creates a merged structure that includes all columns.
  • Empty cells are added for missing data in columns that don't exist in all files.

Remember to handle large files carefully, as reading entire CSV files into memory may not be suitable for extremely large datasets. For such cases, consider implementing a streaming approach or processing the files in chunks.