AWS CDK v2.123.0 で S3 バケットから DynamoDB テーブルへのデータのインポートがサポートされました

#Amazon DynamoDB

#AWS CDK

若槻龍太

2024.01.25

こんにちは、CX 事業本部製造ビジネステクノロジー部の若槻です。

このたび AWS CDK v2.123.0 がリリースされ、S3 バケットから DynamoDB テーブルへのデータインポートがサポートされました。

dynamodb: import data from the bucket (#28610) (45b8398), closes #21825

実装の README はこちらです。

import from S3 Bucket - aws-cdk/packages/aws-cdk-lib/aws-dynamodb/README.md at main · aws/aws-cdk

このデータインポート機能は 2022 年にリリースされていた機能となりますが、それがこのたび CDK でもサポートされた形となります。

AWS ドキュメントはこちらになります。

試してみた

CDK ライブラリのアップグレード

AWS CDK のモジュールを v2.123.0 以上にアップグレードします。

npm i aws-cdk@latest aws-cdk-lib@latest

型定義

aws_dynamodb.Table の　オプション importSource の型定義は次のようになります。

/**
 *  Properties for importing data from the S3.
 */
export interface ImportSourceSpecification {
    /**
     * The compression type of the imported data.
     *
     * @default InputCompressionType.NONE
     */
    readonly compressionType?: InputCompressionType;
    /**
     * The format of the imported data.
     */
    readonly inputFormat: InputFormat;
    /**
     * The S3 bucket that is being imported from.
     */
    readonly bucket: s3.IBucket;
    /**
     * The account number of the S3 bucket that is being imported from.
     *
     * @default - no value
     */
    readonly bucketOwner?: string;
    /**
     * The key prefix shared by all S3 Objects that are being imported.
     *
     * @default - no value
     */
    readonly keyPrefix?: string;
}

インポート可能なデータフォーマットは DynamoDB JSON、Amazon Ion および CSV の 3 つで、次の inputFormat プロパティで指定します。

/**
 * The format of the source data.
 */
export declare abstract class InputFormat {
    /**
     * DynamoDB JSON format.
     */
    static dynamoDBJson(): InputFormat;
    /**
     * Amazon Ion format.
     */
    static ion(): InputFormat;
    /**
     * CSV format.
     */
    static csv(options?: CsvOptions): InputFormat;
    /**
     * Valid CSV delimiters.
     *
     * @see https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-dynamodb-table-csv.html#cfn-dynamodb-table-csv-delimiter
     */
    private static validCsvDelimiters;
    private static readableValidCsvDelimiters;
    /**
     * Render the input format and options.
     *
     * @internal
     */
    abstract _render(): Pick<CfnTable.ImportSourceSpecificationProperty, 'inputFormat' | 'inputFormatOptions'>;
}

CSV フォーマットでインポートする場合は、aws_dynamodb.InputFormat.csv のオプション CsvOptions を使用します。

/**
 * The options for imported source files in CSV format.
 */
export interface CsvOptions {
    /**
     * The delimiter used for separating items in the CSV file being imported.
     *
     * Valid delimiters are as follows:
     * - comma (`,`)
     * - tab (`\t`)
     * - colon (`:`)
     * - semicolon (`;`)
     * - pipe (`|`)
     * - space (` `)
     *
     * @default - use comma as a delimiter.
     */
    readonly delimiter?: string;
    /**
     * List of the headers used to specify a common header for all source CSV files being imported.
     *
     * **NOTE**: If this field is specified then the first line of each CSV file is treated as data instead of the header.
     * If this field is not specified the the first line of each CSV file is treated as the header.
     *
     * @default - the first line of the CSV file is treated as the header
     */
    readonly headerList?: string[];
}

CDK コード

データのインポートを伴う DynamoDB テーブルの CDK コードは次のようになります。インポート元の S3 バケットおよびバケットへのファイルアップロードも合わせて実装しています。

import {
  aws_dynamodb,
  aws_s3,
  aws_s3_deployment,
  Stack,
  RemovalPolicy,
} from 'aws-cdk-lib';
import { Construct } from 'constructs';

export class CdkSampleStack extends Stack {
  constructor(scope: Construct, id: string) {
    super(scope, id);

    // S3 バケット
    const bucket = new aws_s3.Bucket(this, 'Bucket', {
      removalPolicy: RemovalPolicy.DESTROY,
      autoDeleteObjects: true,
    });

    // サンプルデータのアップロード
    new aws_s3_deployment.BucketDeployment(this, 'DeploySampleTableData', {
      sources: [aws_s3_deployment.Source.asset('./src/tableData')],
      destinationBucket: bucket,
      destinationKeyPrefix: 'sampleTable/',
    });

    // DynamoDB テーブル
    new aws_dynamodb.Table(this, 'SampleTable', {
      partitionKey: {
        name: 'id',
        type: aws_dynamodb.AttributeType.STRING,
      },
      removalPolicy: RemovalPolicy.DESTROY,
      importSource: {
        inputFormat: aws_dynamodb.InputFormat.csv({
          delimiter: ',',
          headerList: ['id', 'value'], // 指定する場合はインポートデータのヘッダー行不要
        }),
        bucket,
        keyPrefix: 'sampleTable',
      },
    });
  }
}

サンプルデータの準備

インポートするサンプルデータのファイルを作成します。CDK の実装で headerList を指定しているので、ヘッダー行は不要です。

mkdir src
mkdir src/tableData
echo 'd001,1' > src/tableData/table1.csv
echo 'd002,5' >> src/tableData/table1.csv
echo 'd003' >> src/tableData/table1.csv

デプロイ

CDK デプロイを開始します。

$ npm run deploy

> [email protected] deploy
> cdk deploy --require-approval never --method=direct

マネジメントコンソールで[Imports from S3]を見ると、テーブルへのインポートが開始されています。

CDK デプロイが完了するとインポートも完了しています。

テーブルのデータを確認すると、インポートされたデータが作成されています。

テーブル作成時以外のデプロイではデータインポートは行われない

機能の仕様として、S3 バケットからのデータインポート機能が使用できるのは、テーブル作成時のみです。テーブル作成後にデータをインポートすることはできません。

CDK デプロイを開始してみます。

$ npm run deploy

> [email protected] deploy
> cdk deploy --require-approval never --method=direct

デプロイ完了後も[Imports from S3]にインポートは追加されず、またテーブルにもデータは作成されません。

作成後のテーブルの importSource プロパティを削除すると、テーブルは再作成される

次のようにデータインポートを行ったテーブルの importSource プロパティを、テーブル作成後に削除してみます。

    // DynamoDB テーブル
    new aws_dynamodb.Table(this, 'SampleTable', {
      partitionKey: {
        name: 'id',
        type: aws_dynamodb.AttributeType.STRING,
      },
      removalPolicy: RemovalPolicy.DESTROY,
      // importSource: {
      //   inputFormat: aws_dynamodb.InputFormat.csv({
      //     delimiter: ',',
      //     headerList: ['id', 'value'], // 指定する場合はインポートデータのヘッダー行不要
      //   }),
      //   bucket,
      //   keyPrefix: 'sampleTable',
      // },
    });

するとデータインポート時に作成されたテーブルが削除されました。