Logo
Published on

Importing CSV Data to DynamoDB with Lambda and TypeScript

Introduction

Importing data from CSV files to DynamoDB is a common task for developers working with AWS services. This process can be streamlined using AWS Lambda functions written in TypeScript, providing a serverless solution for efficient data migration.

Let's explore how to create a Lambda function that reads a CSV file from an S3 bucket and imports the data into a DynamoDB table. This approach offers scalability and cost-effectiveness for handling large datasets.

Table of Contents

Setting Up the Environment

Prerequisites

Before we begin, ensure you have the following:

  • AWS account with appropriate permissions
  • Node.js and npm installed
  • AWS CLI configured
  • TypeScript knowledge

Creating the Project Structure

  1. Create a new directory for your project:
mkdir csv-to-dynamodb-lambda
cd csv-to-dynamodb-lambda
  1. Initialize a new Node.js project and install dependencies:
npm init -y
npm install aws-sdk csv-parse @types/node typescript
  1. Create a tsconfig.json file:
{
  "compilerOptions": {
    "target": "es2018",
    "module": "commonjs",
    "strict": true,
    "esModuleInterop": true,
    "outDir": "./dist"
  }
}

Implementing the Lambda Function

Writing the TypeScript Code

Create a file named index.ts with the following code:

import { S3, DynamoDB } from 'aws-sdk'
import { parse } from 'csv-parse'

const s3 = new S3()
const dynamodb = new DynamoDB.DocumentClient()

export const handler = async (event: any) => {
  const bucketName = 'user-data-csv-bucket'
  const fileName = 'users.csv'

  try {
    const s3Object = await s3.getObject({ Bucket: bucketName, Key: fileName }).promise()
    const csvData = s3Object.Body?.toString('utf-8')

    if (!csvData) {
      throw new Error('CSV data is empty')
    }

    const records = await new Promise<any[]>((resolve, reject) => {
      parse(csvData, { columns: true }, (err, data) => {
        if (err) reject(err)
        else resolve(data)
      })
    })

    for (const record of records) {
      await dynamodb
        .put({
          TableName: 'Users',
          Item: record,
        })
        .promise()
    }

    return { statusCode: 200, body: JSON.stringify({ message: 'Data imported successfully' }) }
  } catch (error) {
    console.error('Error:', error)
    return { statusCode: 500, body: JSON.stringify({ message: 'Error importing data' }) }
  }
}

Explanation of the Code

This Lambda function performs the following steps:

  1. Retrieves the CSV file from the S3 bucket.
  2. Parses the CSV data using the csv-parse library.
  3. Iterates through each record and inserts it into the DynamoDB table.

Preparing the CSV File

Create a sample CSV file named users.csv with the following content:

id,name,email,age
1,John Doe,[email protected],32
2,Jane Smith,[email protected],28
3,Michael Johnson,[email protected],45
4,Emily Brown,[email protected],39
5,David Lee,[email protected],22
6,Sarah Wilson,[email protected],31
7,Robert Taylor,[email protected],56
8,Lisa Anderson,[email protected],41
9,Thomas Martinez,[email protected],37
10,Jennifer Garcia,[email protected],29

Deploying the Lambda Function

To deploy the Lambda function:

  1. Compile the TypeScript code:
npx tsc
  1. Create a ZIP file of the compiled code and dependencies:
cd dist
zip -r ../function.zip .
cd ..
  1. Use the AWS CLI to create and deploy the Lambda function:
aws lambda create-function --function-name ImportCsvToDynamoDB \
  --runtime nodejs14.x --handler index.handler \
  --role arn:aws:iam::YOUR_ACCOUNT_ID:role/YOUR_LAMBDA_ROLE \
  --zip-file fileb://function.zip

Testing the Solution

  1. Upload the users.csv file to the S3 bucket.
  2. Invoke the Lambda function manually or set up an S3 trigger.
  3. Check the DynamoDB table to verify the imported data.

Conclusion

By leveraging AWS Lambda and TypeScript, we've created a serverless solution for importing CSV data into DynamoDB. This approach offers flexibility and scalability for handling various data import scenarios. Remember to optimize your Lambda function for larger datasets and consider implementing error handling and logging for production use.