Hive CLI 未填充 table 数据(来自 Create Table as Select Query),而 Hive Beeswax 工作正常
Hive CLI not populating table data (from Create Table as Select Query) while Hive Beeswax works fine
当我在 Hive CLI 上 运行 宁 "Create Table as Select" 查询时,table 已创建但数据未填充。但是,当我 运行 在 Hive Beeswax 上执行相同的查询时,我正在创建目标 table,其中填充了数据。
这里是查询:
hive -e '
create table table_validation as
select listing_id, city, area, expected_amount_inr, property_id, house_type, case when area_builtup_sqft
is NULL or
area_builtup_sqft = 0 or area_builtup_sqft = " " then plot_area else area_builtup_sqft end as area_sqft,
case when area_builtup_sqft is NULL or area_builtup_sqft = 0 or area_builtup_sqft = " "
then expected_amount_inr/plot_area else expected_amount_inr/area_builtup_sqft end as
price_sqft,listing_state,
case when house_type like "apartment" then "apartment" when house_type like "plot" then "plot" else
"others" end as property_type, case when house_type like "plot" then "NA" when num_bedrooms between 1 and 1.9 then 1 when num_bedrooms between
2 and 2.9 then 2 when num_bedrooms between 3 and 3.9 then 3 when num_bedrooms >= 4 then 4 else num_bedrooms end as number_bedrooms
from realestate_listing_main
where listing_type LIKE "rent"
and added_on between '2015-02-01' and '2015-03-31'
' --database default;
当我运行执行此查询时,我得到以下结果:
running hive query
0 2015-03-31 18:40:41,025 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2015-03-31 18:40:41,030 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
2015-03-31 18:40:41,030 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2015-03-31 18:40:41,030 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
2015-03-31 18:40:41,031 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
2015-03-31 18:40:41,031 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2015-03-31 18:40:41,031 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2015-03-31 18:40:41,336 WARN [main] conf.HiveConf (HiveConf.java:initialize(1155)) - DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.12.0-cdh5.1.2.jar!/hive-log4j.properties
OK
Time taken: 0.621 seconds
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1427789583342_0014, Tracking URL = http://ip-10-172-133-249.ap-southeast-1.compute.internal:8088/proxy/application_1427789583342_0014/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1427789583342_0014
Hadoop job information for Stage-1: number of mappers: 10; number of reducers: 0
2015-03-31 18:40:59,849 Stage-1 map = 0%, reduce = 0%
2015-03-31 18:41:10,188 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 5.86 sec
2015-03-31 18:41:11,219 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 5.86 sec
2015-03-31 18:41:12,252 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 5.86 sec
2015-03-31 18:41:13,289 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 5.86 sec
2015-03-31 18:41:14,321 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 5.86 sec
2015-03-31 18:41:15,357 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 5.86 sec
2015-03-31 18:41:16,393 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 39.78 sec
2015-03-31 18:41:17,428 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 41.17 sec
2015-03-31 18:41:18,460 Stage-1 map = 45%, reduce = 0%, Cumulative CPU 43.26 sec
2015-03-31 18:41:19,499 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 49.68 sec
2015-03-31 18:41:20,536 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 50.49 sec
2015-03-31 18:41:21,569 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:22,598 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:23,627 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:24,655 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:25,684 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:26,714 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:27,743 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:28,773 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:29,803 Stage-1 map = 85%, reduce = 0%, Cumulative CPU 61.88 sec
2015-03-31 18:41:30,840 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 63.8 sec
2015-03-31 18:41:31,872 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 63.8 sec
2015-03-31 18:41:32,905 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 69.86 sec
2015-03-31 18:41:33,935 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 71.58 sec
2015-03-31 18:41:34,964 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 71.58 sec
MapReduce Total cumulative CPU time: 1 minutes 11 seconds 580 msec
Ended Job = job_1427789583342_0014
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://ip-10-172-133-249.ap-southeast-1.compute.internal:8020/tmp/hive-root/hive_2015-03-31_18-40-42_689_38529489390850959-1/-ext-10001
Moving data to: hdfs://ip-10-172-133-249.ap-southeast-1.compute.internal:8020/user/hive/warehouse/default.db/table_validation
Table default.table_validation stats: [num_partitions: 0, num_files: 10, num_rows: 0, total_size: 0, raw_data_size: 0]
MapReduce Jobs Launched:
Job 0: Map: 10 Cumulative CPU: 71.58 sec HDFS Read: 2635527679 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 11 seconds 580 msec
OK
Time taken: 52.896 seconds
它没有执行第二个和第三个作业。但是当我 运行 对蜂巢蜂蜡的查询时,所有作业都在执行,并且 table 是用数据创建的。
请让我知道我错过了什么?从过去的 3 天开始,我一直坚持这一点。
得到答案。需要在 运行 查询之前添加 serde.jar
,因为如果没有这个 jar,配置单元将无法识别数据。
当我在 Hive CLI 上 运行 宁 "Create Table as Select" 查询时,table 已创建但数据未填充。但是,当我 运行 在 Hive Beeswax 上执行相同的查询时,我正在创建目标 table,其中填充了数据。
这里是查询:
hive -e '
create table table_validation as
select listing_id, city, area, expected_amount_inr, property_id, house_type, case when area_builtup_sqft
is NULL or
area_builtup_sqft = 0 or area_builtup_sqft = " " then plot_area else area_builtup_sqft end as area_sqft,
case when area_builtup_sqft is NULL or area_builtup_sqft = 0 or area_builtup_sqft = " "
then expected_amount_inr/plot_area else expected_amount_inr/area_builtup_sqft end as
price_sqft,listing_state,
case when house_type like "apartment" then "apartment" when house_type like "plot" then "plot" else
"others" end as property_type, case when house_type like "plot" then "NA" when num_bedrooms between 1 and 1.9 then 1 when num_bedrooms between
2 and 2.9 then 2 when num_bedrooms between 3 and 3.9 then 3 when num_bedrooms >= 4 then 4 else num_bedrooms end as number_bedrooms
from realestate_listing_main
where listing_type LIKE "rent"
and added_on between '2015-02-01' and '2015-03-31'
' --database default;
当我运行执行此查询时,我得到以下结果:
running hive query
0 2015-03-31 18:40:41,025 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2015-03-31 18:40:41,030 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
2015-03-31 18:40:41,030 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2015-03-31 18:40:41,030 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
2015-03-31 18:40:41,031 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
2015-03-31 18:40:41,031 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2015-03-31 18:40:41,031 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1011)) - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2015-03-31 18:40:41,336 WARN [main] conf.HiveConf (HiveConf.java:initialize(1155)) - DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.12.0-cdh5.1.2.jar!/hive-log4j.properties
OK
Time taken: 0.621 seconds
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1427789583342_0014, Tracking URL = http://ip-10-172-133-249.ap-southeast-1.compute.internal:8088/proxy/application_1427789583342_0014/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1427789583342_0014
Hadoop job information for Stage-1: number of mappers: 10; number of reducers: 0
2015-03-31 18:40:59,849 Stage-1 map = 0%, reduce = 0%
2015-03-31 18:41:10,188 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 5.86 sec
2015-03-31 18:41:11,219 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 5.86 sec
2015-03-31 18:41:12,252 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 5.86 sec
2015-03-31 18:41:13,289 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 5.86 sec
2015-03-31 18:41:14,321 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 5.86 sec
2015-03-31 18:41:15,357 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 5.86 sec
2015-03-31 18:41:16,393 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 39.78 sec
2015-03-31 18:41:17,428 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 41.17 sec
2015-03-31 18:41:18,460 Stage-1 map = 45%, reduce = 0%, Cumulative CPU 43.26 sec
2015-03-31 18:41:19,499 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 49.68 sec
2015-03-31 18:41:20,536 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 50.49 sec
2015-03-31 18:41:21,569 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:22,598 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:23,627 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:24,655 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:25,684 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:26,714 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:27,743 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:28,773 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 56.28 sec
2015-03-31 18:41:29,803 Stage-1 map = 85%, reduce = 0%, Cumulative CPU 61.88 sec
2015-03-31 18:41:30,840 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 63.8 sec
2015-03-31 18:41:31,872 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 63.8 sec
2015-03-31 18:41:32,905 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 69.86 sec
2015-03-31 18:41:33,935 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 71.58 sec
2015-03-31 18:41:34,964 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 71.58 sec
MapReduce Total cumulative CPU time: 1 minutes 11 seconds 580 msec
Ended Job = job_1427789583342_0014
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://ip-10-172-133-249.ap-southeast-1.compute.internal:8020/tmp/hive-root/hive_2015-03-31_18-40-42_689_38529489390850959-1/-ext-10001
Moving data to: hdfs://ip-10-172-133-249.ap-southeast-1.compute.internal:8020/user/hive/warehouse/default.db/table_validation
Table default.table_validation stats: [num_partitions: 0, num_files: 10, num_rows: 0, total_size: 0, raw_data_size: 0]
MapReduce Jobs Launched:
Job 0: Map: 10 Cumulative CPU: 71.58 sec HDFS Read: 2635527679 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 11 seconds 580 msec
OK
Time taken: 52.896 seconds
它没有执行第二个和第三个作业。但是当我 运行 对蜂巢蜂蜡的查询时,所有作业都在执行,并且 table 是用数据创建的。
请让我知道我错过了什么?从过去的 3 天开始,我一直坚持这一点。
得到答案。需要在 运行 查询之前添加 serde.jar
,因为如果没有这个 jar,配置单元将无法识别数据。