如何使用sqoop在配置单元中创建多级分区
how to create multi level partition in hive using sqoop
我需要使用 Sqoop 创建具有三个分区 year/month/day 的配置单元 table。我检查了 sqoop 中的 --hive-partition-key 和 --hive-partition-value 。使用这些参数,我创建了这样的分区 year --hive-partition-key year --hive-partition-value '2016'
我的问题是如何为分区键和分区值传递多个值来创建像 year/month/day 这样的分区。
sqoop import --connect jdbc:postgresql://localhost:7432/test_db \
--driver org.postgresql.Driver --username pgadmin --password pgadmin@1234 \
--table user1 \
--fields-terminated-by '[=10=]1' \
--lines-terminated-by '2' \
--hcatalog-database test \
--hcatalog-table user1 \
--hcatalog-partition-keys year,month,day \
--hcatalog-partition-values '2016,08,15' \
--verbose
ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: NoSuchObjectException(message:test.user1 table not found)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:343)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:783)
at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: NoSuchObjectException(message:test.user1 table not found)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:34980)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:34948)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:34879)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1214)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1200)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1201)
at org.apache.hive.hcatalog.common.HCatUtil.getTable(HCatUtil.java:180)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
... 14 more
更新命令:
sqoop import --connect jdbc:postgresql://localhost:7432/test_db \
--driver org.postgresql.Driver --username pgadmin --password pgadmin@1234 \
--table user1 \
--create-hcatalog-table \
--hcatalog-table user1 \
--hcatalog-partition-keys year,month,day \
--hcatalog-partition-values '2016,08,15' \
--verbose
更新命令后出错
16/08/17 05:53:20 INFO hcat.SqoopHCatUtilities: Executing external HCatalog CLI process with args :-f,/tmp/hcat-script-1471413200625
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: MismatchedTokenException(10!=288)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.hive.ql.parse.HiveParser.primitiveType(HiveParser.java:39530)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.hive.ql.parse.HiveParser.type(HiveParser.java:38772)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.hive.ql.parse.HiveParser.colType(HiveParser.java:38522)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:38222)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:36445)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:4864)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: FAILED: ParseException line 3:15 mismatched input ',' expecting ( near 'varchar' in primitive type specificat
16/08/17 05:53:25 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@326de728
16/08/17 05:53:25 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: HCat exited with status 64
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.executeExternalHCatProgram(SqoopHCatUtilities.java:1129)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.launchHCatCli(SqoopHCatUtilities.java:1078)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.createHCatTable(SqoopHCatUtilities.java:625)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:340)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:783)
at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
为了使用Sqoop将数据导入多键分区的Hivetable,可以使用hcatalog-table
feature
例如,在你的情况下,你可以使用这样的东西:
(...) --hcatalog-table <your_table_name> --hcatalog-partition-keys year,month,day
--hcatalog-partition-values 2016,07,01
根据 documentation:
These two options are used to specify multiple static partition
key/value pairs. In the prior releases, --hive-partition-key and
--hive-partition-value options were used to specify the static partition key/value pair, but only one level of static partition keys
could be provided. The options --hcatalog-partition-keys and
--hcatalog-partition-values allow multiple keys and values to be provided as static partitioning keys. Multiple option values are to be
separated by , (comma). For example, if the hive partition keys for
the table to export/import from are defined with partition key names
year, month and date and a specific partition with year=1999,
month=12, day=31 is the desired partition, then the values for the two
options will be as follows:
--hcatalog-partition-keys year,month,day
--hcatalog-partition-values 1999,12,31
我们需要在 steps.at 这个更新的命令中实现它:在上面标记你正在做两件事。
创建新的 table 和 sqooping 数据,根据我的观察,这不会一次起作用,因为我们必须使用 sqoop 创建多级分区。
所以首先使用 hcatalogue 创建 ddl 以支持多级分区..
第 1 步:
sqoop import \
--connect jdbc:oracle:thin \
--username xxxx \
--password yyyy \
--query 'select EMPNO,ENAME,MGR,HIREDATE,SAL,COMM from t_test_emp where $CONDITIONS AND 1=2' \
--create-hcatalog-table \
--hcatalog-database db1 \
--hcatalog-table test_part1 \
--hcatalog-partition-keys DEPTNO,JOB \
--hcatalog-partition-values 1,1 \
-m 1
第 2 步:
现在插入数据:
sqoop import \
--connect jdbc:oracle:thin: \
--username xxxx \
--password yyyy \
--table t_test_emp \
--columns EMPNO,DEPTNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM \
--hcatalog-database db1 \
--hcatalog-table test_part1 \
-m 1
它会起作用..
我需要使用 Sqoop 创建具有三个分区 year/month/day 的配置单元 table。我检查了 sqoop 中的 --hive-partition-key 和 --hive-partition-value 。使用这些参数,我创建了这样的分区 year --hive-partition-key year --hive-partition-value '2016'
我的问题是如何为分区键和分区值传递多个值来创建像 year/month/day 这样的分区。
sqoop import --connect jdbc:postgresql://localhost:7432/test_db \
--driver org.postgresql.Driver --username pgadmin --password pgadmin@1234 \
--table user1 \
--fields-terminated-by '[=10=]1' \
--lines-terminated-by '2' \
--hcatalog-database test \
--hcatalog-table user1 \
--hcatalog-partition-keys year,month,day \
--hcatalog-partition-values '2016,08,15' \
--verbose
ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: NoSuchObjectException(message:test.user1 table not found)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:343)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:783)
at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: NoSuchObjectException(message:test.user1 table not found)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:34980)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:34948)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:34879)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1214)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1200)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1201)
at org.apache.hive.hcatalog.common.HCatUtil.getTable(HCatUtil.java:180)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
... 14 more
更新命令:
sqoop import --connect jdbc:postgresql://localhost:7432/test_db \
--driver org.postgresql.Driver --username pgadmin --password pgadmin@1234 \
--table user1 \
--create-hcatalog-table \
--hcatalog-table user1 \
--hcatalog-partition-keys year,month,day \
--hcatalog-partition-values '2016,08,15' \
--verbose
更新命令后出错
16/08/17 05:53:20 INFO hcat.SqoopHCatUtilities: Executing external HCatalog CLI process with args :-f,/tmp/hcat-script-1471413200625
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: MismatchedTokenException(10!=288)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.hive.ql.parse.HiveParser.primitiveType(HiveParser.java:39530)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.hive.ql.parse.HiveParser.type(HiveParser.java:38772)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.hive.ql.parse.HiveParser.colType(HiveParser.java:38522)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:38222)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:36445)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:4864)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
16/08/17 05:53:24 INFO hcat.SqoopHCatUtilities: FAILED: ParseException line 3:15 mismatched input ',' expecting ( near 'varchar' in primitive type specificat
16/08/17 05:53:25 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@326de728
16/08/17 05:53:25 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: HCat exited with status 64
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.executeExternalHCatProgram(SqoopHCatUtilities.java:1129)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.launchHCatCli(SqoopHCatUtilities.java:1078)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.createHCatTable(SqoopHCatUtilities.java:625)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:340)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:783)
at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
为了使用Sqoop将数据导入多键分区的Hivetable,可以使用hcatalog-table
feature
例如,在你的情况下,你可以使用这样的东西:
(...) --hcatalog-table <your_table_name> --hcatalog-partition-keys year,month,day
--hcatalog-partition-values 2016,07,01
根据 documentation:
These two options are used to specify multiple static partition key/value pairs. In the prior releases, --hive-partition-key and --hive-partition-value options were used to specify the static partition key/value pair, but only one level of static partition keys could be provided. The options --hcatalog-partition-keys and --hcatalog-partition-values allow multiple keys and values to be provided as static partitioning keys. Multiple option values are to be separated by , (comma). For example, if the hive partition keys for the table to export/import from are defined with partition key names year, month and date and a specific partition with year=1999, month=12, day=31 is the desired partition, then the values for the two options will be as follows:
--hcatalog-partition-keys year,month,day --hcatalog-partition-values 1999,12,31
我们需要在 steps.at 这个更新的命令中实现它:在上面标记你正在做两件事。
创建新的 table 和 sqooping 数据,根据我的观察,这不会一次起作用,因为我们必须使用 sqoop 创建多级分区。
所以首先使用 hcatalogue 创建 ddl 以支持多级分区..
第 1 步:
sqoop import \
--connect jdbc:oracle:thin \
--username xxxx \
--password yyyy \
--query 'select EMPNO,ENAME,MGR,HIREDATE,SAL,COMM from t_test_emp where $CONDITIONS AND 1=2' \
--create-hcatalog-table \
--hcatalog-database db1 \
--hcatalog-table test_part1 \
--hcatalog-partition-keys DEPTNO,JOB \
--hcatalog-partition-values 1,1 \
-m 1
第 2 步: 现在插入数据:
sqoop import \
--connect jdbc:oracle:thin: \
--username xxxx \
--password yyyy \
--table t_test_emp \
--columns EMPNO,DEPTNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM \
--hcatalog-database db1 \
--hcatalog-table test_part1 \
-m 1
它会起作用..