Skip to content

AWSで作るはじめてのETL【Glue DataCatalog】

概要

csv連携されるファイルのカラム定義をGlue DataCatalogにて行う。

Glue

Glueを選択

Datacatalog

Data CatalogにてDatabasesを選択

Add Databaseをクリック

Create a database

db-[自分の名前]-[番号]

Create databaseをクリックして作成

完了

でけた

Tables

Databases > Tables にて

Add tableをクリック

テーブル定義

  • Name: users
  • Database: db-[自分の名前]-[番号]

  • Data store:
    • Include path s3://s3-[自分の名前]-[番号]-datalake/users/
  • Data format: csv

スキーマ定義

Define or upload schemaを選択

Edit schema as JSONをクリック

下記Jsonを設定

json
[
  {
    "Name": "user_id",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "name",
    "Type": "string",
    "Comment": ""
  },
  {
    "Name": "email",
    "Type": "string",
    "Comment": ""
  },
  {
    "Name": "password_hash",
    "Type": "string",
    "Comment": ""
  },
  {
    "Name": "age",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "gender",
    "Type": "string",
    "Comment": ""
  },
  
  {
    "Name": "created_at",
    "Type": "date",
    "Comment": ""
  },
  {
    "Name": "updated_at",
    "Type": "date",
    "Comment": ""
  }
]

Create

他スキーマ作成

手順「Tables」->「Create」を繰り返し同様にその他スキーマを作成する

products

Table details

  • Name: products
  • Database: db-[自分の名前]-[番号]
  • Data store: - Include path s3://s3-[自分の名前]-[番号]-datalake/products/
  • Data format: csv

Schema

json
[
  {
    "Name": "product_id",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "name",
    "Type": "string",
    "Comment": ""
  },
  {
    "Name": "description",
    "Type": "string",
    "Comment": ""
  },
  {
    "Name": "price",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "stock",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "created_at",
    "Type": "date",
    "Comment": ""
  },
  {
    "Name": "updated_at",
    "Type": "date",
    "Comment": ""
  }
]

orders

Table details

  • Name: orders
  • Database: db-[自分の名前]-[番号]
  • Data store: - Include path s3://s3-[自分の名前]-[番号]-datalake/orders/
  • Data format: csv

Schema

json
[
  {
    "Name": "order_id",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "user_id",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "total_price",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "order_status",
    "Type": "string",
    "Comment": ""
  },
  {
    "Name": "created_at",
    "Type": "date",
    "Comment": ""
  },
  {
    "Name": "updated_at",
    "Type": "date",
    "Comment": ""
  }
]

order_items

  • Name: order_items
  • Database: db-[自分の名前]-[番号]
  • Data store: - Include path s3://s3-[自分の名前]-[番号]-datalake/order_items/
  • Data format: csv

Schema

json
[
  {
    "Name": "order_item_id",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "order_id",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "product_id",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "quantity",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "price",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "created_at",
    "Type": "date",
    "Comment": ""
  }
]

weather

  • Name: weather
  • Database: db-[自分の名前]-[番号]
  • Data store: - Include path s3://s3-[自分の名前]-[番号]-datalake/weather/
  • Data format: csv

Schema

json
[
  {
    "Name": "weather_id",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "date_time",
    "Type": "date",
    "Comment": ""
  },
  {
    "Name": "temperature",
    "Type": "int",
    "Comment": ""
  },
  {
    "Name": "weather_condition",
    "Type": "string",
    "Comment": ""
  },
  {
    "Name": "created_at",
    "Type": "date",
    "Comment": ""
  },
  {
    "Name": "updated_at",
    "Type": "date",
    "Comment": ""
  }
]

完了

でけた

一覧に戻る

構築一覧に戻る