AWSで作るはじめてのETL【Glue DataCatalog】
概要
csv連携されるファイルのカラム定義をGlue DataCatalogにて行う。
Glue
Glueを選択

Datacatalog
Data CatalogにてDatabasesを選択
Add Databaseをクリック

Create a database
db-[自分の名前]-[番号]
Create databaseをクリックして作成

完了
でけた 
Tables
Databases > Tables にて
Add tableをクリック

テーブル定義
- Name:
users - Database:
db-[自分の名前]-[番号]

- Data store:
- Include path
s3://s3-[自分の名前]-[番号]-datalake/users/
- Include path
- Data format:
csv

スキーマ定義
Define or upload schemaを選択
Edit schema as JSONをクリック

下記Jsonを設定
json
[
{
"Name": "user_id",
"Type": "int",
"Comment": ""
},
{
"Name": "name",
"Type": "string",
"Comment": ""
},
{
"Name": "email",
"Type": "string",
"Comment": ""
},
{
"Name": "password_hash",
"Type": "string",
"Comment": ""
},
{
"Name": "age",
"Type": "int",
"Comment": ""
},
{
"Name": "gender",
"Type": "string",
"Comment": ""
},
{
"Name": "created_at",
"Type": "date",
"Comment": ""
},
{
"Name": "updated_at",
"Type": "date",
"Comment": ""
}
]
Create

他スキーマ作成
手順「Tables」->「Create」を繰り返し同様にその他スキーマを作成する
products
Table details
- Name:
products - Database:
db-[自分の名前]-[番号] - Data store: - Include path
s3://s3-[自分の名前]-[番号]-datalake/products/ - Data format:
csv
Schema
json
[
{
"Name": "product_id",
"Type": "int",
"Comment": ""
},
{
"Name": "name",
"Type": "string",
"Comment": ""
},
{
"Name": "description",
"Type": "string",
"Comment": ""
},
{
"Name": "price",
"Type": "int",
"Comment": ""
},
{
"Name": "stock",
"Type": "int",
"Comment": ""
},
{
"Name": "created_at",
"Type": "date",
"Comment": ""
},
{
"Name": "updated_at",
"Type": "date",
"Comment": ""
}
]orders
Table details
- Name:
orders - Database:
db-[自分の名前]-[番号] - Data store: - Include path
s3://s3-[自分の名前]-[番号]-datalake/orders/ - Data format:
csv
Schema
json
[
{
"Name": "order_id",
"Type": "int",
"Comment": ""
},
{
"Name": "user_id",
"Type": "int",
"Comment": ""
},
{
"Name": "total_price",
"Type": "int",
"Comment": ""
},
{
"Name": "order_status",
"Type": "string",
"Comment": ""
},
{
"Name": "created_at",
"Type": "date",
"Comment": ""
},
{
"Name": "updated_at",
"Type": "date",
"Comment": ""
}
]order_items
- Name:
order_items - Database:
db-[自分の名前]-[番号] - Data store: - Include path
s3://s3-[自分の名前]-[番号]-datalake/order_items/ - Data format:
csv
Schema
json
[
{
"Name": "order_item_id",
"Type": "int",
"Comment": ""
},
{
"Name": "order_id",
"Type": "int",
"Comment": ""
},
{
"Name": "product_id",
"Type": "int",
"Comment": ""
},
{
"Name": "quantity",
"Type": "int",
"Comment": ""
},
{
"Name": "price",
"Type": "int",
"Comment": ""
},
{
"Name": "created_at",
"Type": "date",
"Comment": ""
}
]weather
- Name:
weather - Database:
db-[自分の名前]-[番号] - Data store: - Include path
s3://s3-[自分の名前]-[番号]-datalake/weather/ - Data format:
csv
Schema
json
[
{
"Name": "weather_id",
"Type": "int",
"Comment": ""
},
{
"Name": "date_time",
"Type": "date",
"Comment": ""
},
{
"Name": "temperature",
"Type": "int",
"Comment": ""
},
{
"Name": "weather_condition",
"Type": "string",
"Comment": ""
},
{
"Name": "created_at",
"Type": "date",
"Comment": ""
},
{
"Name": "updated_at",
"Type": "date",
"Comment": ""
}
]完了
でけた

