- Published on
Merging Multiple CSV Files Using Node.js and TypeScript
Let's start with an interactive widget that allows us to merge multiple CSV files.
CSV Merger
Drag & drop your CSV files here, or
Often, we need to merge multiple CSV files, especially when dealing with large datasets from various sources.
This blog post will guide you through creating a robust Node.js function using TypeScript to merge multiple CSV files. We'll even handle cases where the CSV files have different columns!
Table of Contents
Understanding the Challenge
When merging CSV files, we often encounter two main scenarios:
- CSV files with identical columns
- CSV files with different columns
Our goal is to create a function that can handle both cases seamlessly, producing a merged CSV file that includes all unique columns from the input files.
Setting Up the Project
Before we dive into the code, let's set up our Node.js project with TypeScript:
- Create a new directory for your project:
mkdir csv-merger
cd csv-merger
- Initialize a new Node.js project:
npm init -y
- Install necessary dependencies:
npm install typescript ts-node @types/node
npm install --save-dev @types/csv-parse @types/csv-stringify
- Create a
tsconfig.json
file:
{
"compilerOptions": {
"target": "es6",
"module": "commonjs",
"outDir": "./dist",
"strict": true,
"esModuleInterop": true
}
}
Creating the CSV Merger Function
Now, let's create our CSV merger function. We'll break it down into smaller, manageable parts:
1. Reading CSV Files
First, we need a function to read CSV files:
import { createReadStream } from 'fs'
import { parse } from 'csv-parse'
async function readCSV(filePath: string): Promise<string[][]> {
return new Promise((resolve, reject) => {
const rows: string[][] = []
createReadStream(filePath)
.pipe(parse())
.on('data', (row: string[]) => rows.push(row))
.on('end', () => resolve(rows))
.on('error', reject)
})
}
This function reads a CSV file and returns its contents as a 2D array.
2. Merging CSV Data
Next, we'll create the core function to merge multiple CSV files:
function mergeCSVs(csvDataArray: string[][][]): string[][] {
if (csvDataArray.length === 0) return []
// Collect all unique headers
const allHeaders = new Set<string>()
csvDataArray.forEach((csv) => {
csv[0].forEach((header) => allHeaders.add(header))
})
const mergedHeaders = Array.from(allHeaders)
// Create a map of header to index for quick lookup
const headerIndex = new Map(mergedHeaders.map((header, index) => [header, index]))
const mergedData: string[][] = [mergedHeaders]
csvDataArray.forEach((csv) => {
const csvHeaderIndex = new Map(csv[0].map((header, index) => [header, index]))
// Start from 1 to skip the header row
for (let i = 1; i < csv.length; i++) {
const row = csv[i]
const mergedRow = new Array(mergedHeaders.length).fill('')
// Fill in the data for this row
csv[0].forEach((header, index) => {
const mergedIndex = headerIndex.get(header)
if (mergedIndex !== undefined) {
mergedRow[mergedIndex] = row[index]
}
})
mergedData.push(mergedRow)
}
})
return mergedData
}
This function handles CSVs with different columns by:
- Collecting all unique headers from all CSV files
- Creating a merged header row with all unique headers
- Mapping each CSV's data to the merged header structure, filling in empty strings for missing columns
3. Writing the Merged CSV
Finally, we need a function to write the merged data to a new CSV file:
import { createWriteStream } from 'fs'
import { stringify } from 'csv-stringify'
async function writeCSV(data: string[][], outputPath: string): Promise<void> {
return new Promise((resolve, reject) => {
stringify(data, (err, output) => {
if (err) {
reject(err)
} else {
createWriteStream(outputPath).write(output, resolve)
}
})
})
}
This function takes the merged data and writes it to a new CSV file.
Putting It All Together
Now, let's create a main function to orchestrate the CSV merging process:
async function main() {
const filesToMerge = ['file1.csv', 'file2.csv', 'file3.csv']
const outputFile = 'merged.csv'
try {
const csvDataArray = await Promise.all(filesToMerge.map((file) => readCSV(file)))
const mergedData = mergeCSVs(csvDataArray)
await writeCSV(mergedData, outputFile)
console.log(`Merged CSV saved to \${outputFile}`)
} catch (error) {
console.error('Error merging CSV files:', error)
}
}
main()
This main function:
- Reads all input CSV files
- Merges the CSV data
- Writes the merged data to a new CSV file
This solution maintains the order of columns based on the file sequence and handles missing data gracefully.
Key takeaways:
- The function collects all unique headers from all input files.
- It creates a merged structure that includes all columns.
- Empty cells are added for missing data in columns that don't exist in all files.
Remember to handle large files carefully, as reading entire CSV files into memory may not be suitable for extremely large datasets. For such cases, consider implementing a streaming approach or processing the files in chunks.