AWSで作るはじめてのETL【Glue DataCatalog】
概要
csv連携されるファイルのカラム定義をGlue DataCatalogにて行う。
Glue
Glueを選択
Datacatalog
Data CatalogにてDatabasesを選択
Add Database
をクリック
Create a database
db-[自分の名前]-[番号]
Create database
をクリックして作成
完了
でけた
Tables
Databases
> Tables
にて
Add table
をクリック
テーブル定義
- Name:
users
- Database:
db-[自分の名前]-[番号]
- Data store:
- Include path
s3://s3-[自分の名前]-[番号]-datalake/users/
- Include path
- Data format:
csv
スキーマ定義
Define or upload schema
を選択
Edit schema as JSON
をクリック
下記Jsonを設定
json
[
{
"Name": "user_id",
"Type": "int",
"Comment": ""
},
{
"Name": "name",
"Type": "string",
"Comment": ""
},
{
"Name": "email",
"Type": "string",
"Comment": ""
},
{
"Name": "password_hash",
"Type": "string",
"Comment": ""
},
{
"Name": "age",
"Type": "int",
"Comment": ""
},
{
"Name": "gender",
"Type": "string",
"Comment": ""
},
{
"Name": "created_at",
"Type": "date",
"Comment": ""
},
{
"Name": "updated_at",
"Type": "date",
"Comment": ""
}
]
Create
他スキーマ作成
手順「Tables」->「Create」を繰り返し同様にその他スキーマを作成する
products
Table details
- Name:
products
- Database:
db-[自分の名前]-[番号]
- Data store: - Include path
s3://s3-[自分の名前]-[番号]-datalake/products/
- Data format:
csv
Schema
json
[
{
"Name": "product_id",
"Type": "int",
"Comment": ""
},
{
"Name": "name",
"Type": "string",
"Comment": ""
},
{
"Name": "description",
"Type": "string",
"Comment": ""
},
{
"Name": "price",
"Type": "int",
"Comment": ""
},
{
"Name": "stock",
"Type": "int",
"Comment": ""
},
{
"Name": "created_at",
"Type": "date",
"Comment": ""
},
{
"Name": "updated_at",
"Type": "date",
"Comment": ""
}
]
orders
Table details
- Name:
orders
- Database:
db-[自分の名前]-[番号]
- Data store: - Include path
s3://s3-[自分の名前]-[番号]-datalake/orders/
- Data format:
csv
Schema
json
[
{
"Name": "order_id",
"Type": "int",
"Comment": ""
},
{
"Name": "user_id",
"Type": "int",
"Comment": ""
},
{
"Name": "total_price",
"Type": "int",
"Comment": ""
},
{
"Name": "order_status",
"Type": "string",
"Comment": ""
},
{
"Name": "created_at",
"Type": "date",
"Comment": ""
},
{
"Name": "updated_at",
"Type": "date",
"Comment": ""
}
]
order_items
- Name:
order_items
- Database:
db-[自分の名前]-[番号]
- Data store: - Include path
s3://s3-[自分の名前]-[番号]-datalake/order_items/
- Data format:
csv
Schema
json
[
{
"Name": "order_item_id",
"Type": "int",
"Comment": ""
},
{
"Name": "order_id",
"Type": "int",
"Comment": ""
},
{
"Name": "product_id",
"Type": "int",
"Comment": ""
},
{
"Name": "quantity",
"Type": "int",
"Comment": ""
},
{
"Name": "price",
"Type": "int",
"Comment": ""
},
{
"Name": "created_at",
"Type": "date",
"Comment": ""
}
]
weather
- Name:
weather
- Database:
db-[自分の名前]-[番号]
- Data store: - Include path
s3://s3-[自分の名前]-[番号]-datalake/weather/
- Data format:
csv
Schema
json
[
{
"Name": "weather_id",
"Type": "int",
"Comment": ""
},
{
"Name": "date_time",
"Type": "date",
"Comment": ""
},
{
"Name": "temperature",
"Type": "int",
"Comment": ""
},
{
"Name": "weather_condition",
"Type": "string",
"Comment": ""
},
{
"Name": "created_at",
"Type": "date",
"Comment": ""
},
{
"Name": "updated_at",
"Type": "date",
"Comment": ""
}
]
完了
でけた