使用 JSON SERDE 读取 json 文件时,我只得到一行
I am only getting one row when using JSON SERDE to read a json file
JSON数据:
[{
"liked": "true",
"user_id": "101",
"video_end_type": "3",
"minutes_played": "3",
"video_id": "101",
"geo_cd": "AP",
"channel_id": "11",
"creator_id": "101",
"timestamp": "07/05/2019 01:36:35",
"disliked": "true"
},
{
"liked": true,
"user_id": 102,
"video_end_type": null,
"minutes_played": 4,
"video_id": 102,
"geo_cd": "AP",
"channel_id": 12,
"creator_id": 102,
"timestamp": "15/04/2019 17:04:00",
"disliked": true
}
]
输出:
hive> select * from stream;
OK
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
true 101 3 3 101 AP 11 101 NULL true
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
我得到的输出只是第一行。
代码:
add jar json-serde-1.3.8-jar-with-dependencies.jar;
create external table stream(
liked string, user_id int, video_end_type int, minutes_played int, video_id int, geo_cd string, channel_id int, creator_id int, time timestamp, disliked
string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/location';
注意:我无法删除括号,因为那样的话数据的 JSON 格式不正确。
类似的东西对你有用:
JSON 数据:
{"liked": "true", "user_id": "101", "video_end_type": "3", "minutes_played": "3", "video_id": "101", "geo_cd": "AP", "channel_id": "11", "creator_id": "101", "timestamp": "07/05/2019 01:36:35", "disliked": "true"}
{"liked": true, "user_id": 102, "video_end_type": null, "minutes_played": 4, "video_id": 102, "geo_cd": "AP", "channel_id": 12, "creator_id": 102, "timestamp": "15/04/2019 17:04:00", "disliked": true}
您的创建查询应如下所示:
create external table stream(
liked string,
user_id string,
video_end_type string,
minutes_played string,
video_id string,
geo_cd string,
channel_id string,
creator_id string,
`timestamp` string,
disliked string
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE
LOCATION '/location/';
注意:如果你的JSON数据是一致的,你可以使用适当的数据类型而不是字符串。
JSON数据:
[{
"liked": "true",
"user_id": "101",
"video_end_type": "3",
"minutes_played": "3",
"video_id": "101",
"geo_cd": "AP",
"channel_id": "11",
"creator_id": "101",
"timestamp": "07/05/2019 01:36:35",
"disliked": "true"
},
{
"liked": true,
"user_id": 102,
"video_end_type": null,
"minutes_played": 4,
"video_id": 102,
"geo_cd": "AP",
"channel_id": 12,
"creator_id": 102,
"timestamp": "15/04/2019 17:04:00",
"disliked": true
}
]
输出:
hive> select * from stream;
OK
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
true 101 3 3 101 AP 11 101 NULL true
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
我得到的输出只是第一行。
代码:
add jar json-serde-1.3.8-jar-with-dependencies.jar;
create external table stream(
liked string, user_id int, video_end_type int, minutes_played int, video_id int, geo_cd string, channel_id int, creator_id int, time timestamp, disliked
string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/location';
注意:我无法删除括号,因为那样的话数据的 JSON 格式不正确。
类似的东西对你有用:
JSON 数据:
{"liked": "true", "user_id": "101", "video_end_type": "3", "minutes_played": "3", "video_id": "101", "geo_cd": "AP", "channel_id": "11", "creator_id": "101", "timestamp": "07/05/2019 01:36:35", "disliked": "true"}
{"liked": true, "user_id": 102, "video_end_type": null, "minutes_played": 4, "video_id": 102, "geo_cd": "AP", "channel_id": 12, "creator_id": 102, "timestamp": "15/04/2019 17:04:00", "disliked": true}
您的创建查询应如下所示:
create external table stream(
liked string,
user_id string,
video_end_type string,
minutes_played string,
video_id string,
geo_cd string,
channel_id string,
creator_id string,
`timestamp` string,
disliked string
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE
LOCATION '/location/';
注意:如果你的JSON数据是一致的,你可以使用适当的数据类型而不是字符串。