AWS Athena 导入 CSV 文件
AWS Athena Import CSV file
我将此数据以 .csv 格式存储在 S3 中(但它可以是任何其他文件格式,最适合我的要求table):
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
我想每列有一个坐标:
像这样:
坐标:
1. 41.9100687,-87.8805614
2. 41.9802511,-87.8803253
3. 41.9806802,-87.8792417
导入 S3 后,我选择 CSV 作为数据类型...然后添加字符串列。
但是我得到了一些奇怪的 table 输出。除此之外,我尝试将其导入为带有逗号分隔符的普通 txt 文件。我得到了同样奇怪的输出。
我做错了什么?
编辑
此 test
列屏幕截图是来自另一个相同示例的查询。应该有gps_coordinates
为了重现您的情况,我执行了以下操作:
- 使用您的样本数据创建了一个文本文件(
gps.txt
)
- 已将其上传到 Amazon S3 存储桶在其自己的文件夹中(该文件夹中没有其他文件)
- 在 Amazon Athena 中创建了 table
- 将位置指定为文件夹名称 (
s3://my-bucket/gps/
)
- 指定了 7 列(因为示例文件中有 7 个字符串值)
但是,由于数据在每对数字中都有逗号,我将 SerDe 更改为 OpenCSVSerDe for Processing CSV - Amazon Athena:
CREATE EXTERNAL TABLE IF NOT EXISTS default.gps (
`c1` string,
`c2` string,
`c3` string,
`c4` string,
`c5` string,
`c6` string,
`c7` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\")
LOCATION 's3://my-bucket/gps/'
TBLPROPERTIES ('has_encrypted_data'='false');
然后我能够成功查询 table。示例列值为:41.9100687,-87.8805614
我将此数据以 .csv 格式存储在 S3 中(但它可以是任何其他文件格式,最适合我的要求table):
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
"41.9100687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417","41.9810128,-87.8785121","41.9200687,-87.8805614","41.9802511,-87.8803253","41.9806802,-87.8792417",
我想每列有一个坐标:
像这样:
坐标:
1. 41.9100687,-87.8805614
2. 41.9802511,-87.8803253
3. 41.9806802,-87.8792417
导入 S3 后,我选择 CSV 作为数据类型...然后添加字符串列。
但是我得到了一些奇怪的 table 输出。除此之外,我尝试将其导入为带有逗号分隔符的普通 txt 文件。我得到了同样奇怪的输出。
我做错了什么?
编辑
此 test
列屏幕截图是来自另一个相同示例的查询。应该有gps_coordinates
为了重现您的情况,我执行了以下操作:
- 使用您的样本数据创建了一个文本文件(
gps.txt
) - 已将其上传到 Amazon S3 存储桶在其自己的文件夹中(该文件夹中没有其他文件)
- 在 Amazon Athena 中创建了 table
- 将位置指定为文件夹名称 (
s3://my-bucket/gps/
) - 指定了 7 列(因为示例文件中有 7 个字符串值)
- 将位置指定为文件夹名称 (
但是,由于数据在每对数字中都有逗号,我将 SerDe 更改为 OpenCSVSerDe for Processing CSV - Amazon Athena:
CREATE EXTERNAL TABLE IF NOT EXISTS default.gps (
`c1` string,
`c2` string,
`c3` string,
`c4` string,
`c5` string,
`c6` string,
`c7` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\")
LOCATION 's3://my-bucket/gps/'
TBLPROPERTIES ('has_encrypted_data'='false');
然后我能够成功查询 table。示例列值为:41.9100687,-87.8805614