oozie 工作流中带有 hcatalog 的 sqoop 操作有问题
sqoop action with hcatalog in oozie workflow has problem
当我使用 sqoop export 命令将数据从 hive 导出到 mirosoft sql 服务器时,在 ambary-views 中使用带有 hcatalog 的 sqoop actin 时出现问题。
shell 中的以下命令 运行 正确并且运行良好。
sqoop export --connect 'jdbc:sqlserver://x.x.x.x:1433;useNTLMv2=true;databasename=BigDataDB' --connection-manager org.apache.sqoop.manager.SQLServerManager --username 'DataApp' --password 'D@t@User' --table tr1 --hcatalog-database temporary --catalog-table 'daily_tr'
但是当我在 oozie 工作流中使用此命令创建 sqoop 操作时,出现以下错误:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], main() threw exception, org/apache/hive/hcatalog/mapreduce/HCatOutputFormat
java.lang.NoClassDefFoundError: org/apache/hive/hcatalog/mapreduce/HCatOutputFormat
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:432)
at org.apache.sqoop.manager.SQLServerManager.exportTable(SQLServerManager.java:192)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:171)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:153)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:75)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:231)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.hcatalog.mapreduce.HCatOutputFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 27 more
为了解决此错误,我执行以下操作:
- 在 workflow.xml 所在的文件夹下,我创建文件夹 lib 并将所有来自 sharedlibDir 的配置单元 jar 文件放在那里(/user/oozie/share/lib/lib_201806281525405/hive
我的目标是做到这一点,组件识别 hcatalog jar 文件和类路径,所以我不确定,也许我不应该这样做并针对这个错误采取不同的解决方案
无论如何,错误已更改如下:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], main() threw exception, org.apache.hadoop.hive.shims.HadoopShims.g
etUGIForConf(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/security/UserGroupInformation;
java.lang.NoSuchMethodError: org.apache.hadoop.hive.shims.HadoopShims.getUGIForConf(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/sec
urity/UserGroupInformation;
at org.apache.hive.hcatalog.common.HiveClientCache$HiveClientCacheKey.<init>(HiveClientCache.java:201)
at org.apache.hive.hcatalog.common.HiveClientCache$HiveClientCacheKey.fromHiveConf(HiveClientCache.java:207)
at org.apache.hive.hcatalog.common.HiveClientCache.get(HiveClientCache.java:138)
at org.apache.hive.hcatalog.common.HCatUtil.getHiveClient(HCatUtil.java:564)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:104)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:85)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:63)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:349)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:433)
at org.apache.sqoop.manager.SQLServerManager.exportTable(SQLServerManager.java:192)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:171)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:153)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:75)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:231)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
版本:
HDP 2.6.5.0
纱线 2.7.3
配置单元 1.2.1000
sqoop 1.4.6
oozie 4.2.0
请帮助我解决错误和问题,以及为什么 sqoop 命令在 shell 中可以正常工作,但在 oozie 工作流中却有错误?
我不知道这是不是罪魁祸首。一年前,我在使用 Sqoop 1 的 HDP 中遇到过这个问题。4.x,它显示由于一些不相关的失败原因而被杀死。
当您从命令行在 sqoop 命令下 运行 时,它 运行 成功了。
sqoop export --connect 'jdbc:sqlserver://x.x.x.x:1433;useNTLMv2=true;databasename=BigDataDB' --connection-manager org.apache.sqoop.manager.SQLServerManager --username 'DataApp' --password 'D@t@User' --table tr1 --hcatalog-database temporary --catalog-table 'daily_tr'
但是当您 运行 通过 Oozie Sqoop 操作执行相同的命令时,它不应该像下面那样使用单引号 (')。
<command>export --connect jdbc:sqlserver://x.x.x.x:1433;useNTLMv2=true;databasename=BigDataDB --connection-manager org.apache.sqoop.manager.SQLServerManager --username DataApp --password D@t@User --table tr1 --hcatalog-database temporary --catalog-table daily_tr</command>
我通过以下方式解决了我的问题:
1- 在 workflow.xml:
的命令标签中使用 ( --hcatalog-home /usr/hdp/current/hive-webhcat )
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="loadtosql">
<start to="sqoop_export"/>
<action name="sqoop_export">
<sqoop xmlns="uri:oozie:sqoop-action:0.4">
<job-tracker>${resourceManager}</job-tracker>
<name-node>${nameNode}</name-node>
<command>export --connect jdbc:sqlserver://x.x.x.x:1433;useNTLMv2=true;databasename=BigDataDB --connection-manager org.apache.sqoop.manager.SQLServerManager --username DataApp--password D@t@User --table tr1 --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database temporary --hcatalog-table daily_tr </command>
<file>/user/ambari-qa/test/lib/hive-site.xml</file>
<file>/user/ambari-qa/test/lib/tez-site.xml</file>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>${wf:errorMessage(wf:lastErrorNode())}</message>
</kill>
<end name="end"/>
</workflow-app>
2- 在 hdfs 上在 workflow.xml 旁边创建 lib 文件夹并将 hive-site.xml 和 tez-site.xml 放入其中(从 [= 上传 hive-site.xml 27=].6.5.0-292/0/ 和 tez-site.xml 从 /etc/tez/2.6.5.0-292/0/ 到 hdfs 上的 lib 文件夹)
根据上面的工作流程定义两个文件(hive-site.xml 和 tez-site.xml)
<file>/user/ambari-qa/test/lib/hive-site.xml</file>
<file>/user/ambari-qa/test/lib/tez-site.xml</file>
3- 在 job.properties 文件中定义以下 属性:
oozie.action.sharelib.for.sqoop=sqoop,hive,hcatalog
4- 确保 /etc/oozie/conf 下的 oozie-site.xml 指定了以下 属性。
<property>
<name>oozie.credentials.credentialclasses</name>
<value>hcat=org.apache.oozie.action.hadoop.HCatCredentials</value>
</property>
当我使用 sqoop export 命令将数据从 hive 导出到 mirosoft sql 服务器时,在 ambary-views 中使用带有 hcatalog 的 sqoop actin 时出现问题。
shell 中的以下命令 运行 正确并且运行良好。
sqoop export --connect 'jdbc:sqlserver://x.x.x.x:1433;useNTLMv2=true;databasename=BigDataDB' --connection-manager org.apache.sqoop.manager.SQLServerManager --username 'DataApp' --password 'D@t@User' --table tr1 --hcatalog-database temporary --catalog-table 'daily_tr'
但是当我在 oozie 工作流中使用此命令创建 sqoop 操作时,出现以下错误:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], main() threw exception, org/apache/hive/hcatalog/mapreduce/HCatOutputFormat
java.lang.NoClassDefFoundError: org/apache/hive/hcatalog/mapreduce/HCatOutputFormat
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:432)
at org.apache.sqoop.manager.SQLServerManager.exportTable(SQLServerManager.java:192)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:171)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:153)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:75)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:231)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.hcatalog.mapreduce.HCatOutputFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 27 more
为了解决此错误,我执行以下操作:
- 在 workflow.xml 所在的文件夹下,我创建文件夹 lib 并将所有来自 sharedlibDir 的配置单元 jar 文件放在那里(/user/oozie/share/lib/lib_201806281525405/hive
我的目标是做到这一点,组件识别 hcatalog jar 文件和类路径,所以我不确定,也许我不应该这样做并针对这个错误采取不同的解决方案
无论如何,错误已更改如下:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], main() threw exception, org.apache.hadoop.hive.shims.HadoopShims.g
etUGIForConf(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/security/UserGroupInformation;
java.lang.NoSuchMethodError: org.apache.hadoop.hive.shims.HadoopShims.getUGIForConf(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/sec
urity/UserGroupInformation;
at org.apache.hive.hcatalog.common.HiveClientCache$HiveClientCacheKey.<init>(HiveClientCache.java:201)
at org.apache.hive.hcatalog.common.HiveClientCache$HiveClientCacheKey.fromHiveConf(HiveClientCache.java:207)
at org.apache.hive.hcatalog.common.HiveClientCache.get(HiveClientCache.java:138)
at org.apache.hive.hcatalog.common.HCatUtil.getHiveClient(HCatUtil.java:564)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:104)
at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:85)
at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:63)
at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:349)
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:433)
at org.apache.sqoop.manager.SQLServerManager.exportTable(SQLServerManager.java:192)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:171)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:153)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:75)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:231)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
版本:
HDP 2.6.5.0
纱线 2.7.3
配置单元 1.2.1000
sqoop 1.4.6
oozie 4.2.0
请帮助我解决错误和问题,以及为什么 sqoop 命令在 shell 中可以正常工作,但在 oozie 工作流中却有错误?
我不知道这是不是罪魁祸首。一年前,我在使用 Sqoop 1 的 HDP 中遇到过这个问题。4.x,它显示由于一些不相关的失败原因而被杀死。
当您从命令行在 sqoop 命令下 运行 时,它 运行 成功了。
sqoop export --connect 'jdbc:sqlserver://x.x.x.x:1433;useNTLMv2=true;databasename=BigDataDB' --connection-manager org.apache.sqoop.manager.SQLServerManager --username 'DataApp' --password 'D@t@User' --table tr1 --hcatalog-database temporary --catalog-table 'daily_tr'
但是当您 运行 通过 Oozie Sqoop 操作执行相同的命令时,它不应该像下面那样使用单引号 (')。
<command>export --connect jdbc:sqlserver://x.x.x.x:1433;useNTLMv2=true;databasename=BigDataDB --connection-manager org.apache.sqoop.manager.SQLServerManager --username DataApp --password D@t@User --table tr1 --hcatalog-database temporary --catalog-table daily_tr</command>
我通过以下方式解决了我的问题:
1- 在 workflow.xml:
的命令标签中使用 ( --hcatalog-home /usr/hdp/current/hive-webhcat )<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="loadtosql">
<start to="sqoop_export"/>
<action name="sqoop_export">
<sqoop xmlns="uri:oozie:sqoop-action:0.4">
<job-tracker>${resourceManager}</job-tracker>
<name-node>${nameNode}</name-node>
<command>export --connect jdbc:sqlserver://x.x.x.x:1433;useNTLMv2=true;databasename=BigDataDB --connection-manager org.apache.sqoop.manager.SQLServerManager --username DataApp--password D@t@User --table tr1 --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database temporary --hcatalog-table daily_tr </command>
<file>/user/ambari-qa/test/lib/hive-site.xml</file>
<file>/user/ambari-qa/test/lib/tez-site.xml</file>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>${wf:errorMessage(wf:lastErrorNode())}</message>
</kill>
<end name="end"/>
</workflow-app>
2- 在 hdfs 上在 workflow.xml 旁边创建 lib 文件夹并将 hive-site.xml 和 tez-site.xml 放入其中(从 [= 上传 hive-site.xml 27=].6.5.0-292/0/ 和 tez-site.xml 从 /etc/tez/2.6.5.0-292/0/ 到 hdfs 上的 lib 文件夹)
根据上面的工作流程定义两个文件(hive-site.xml 和 tez-site.xml)
<file>/user/ambari-qa/test/lib/hive-site.xml</file>
<file>/user/ambari-qa/test/lib/tez-site.xml</file>
3- 在 job.properties 文件中定义以下 属性:
oozie.action.sharelib.for.sqoop=sqoop,hive,hcatalog
4- 确保 /etc/oozie/conf 下的 oozie-site.xml 指定了以下 属性。
<property>
<name>oozie.credentials.credentialclasses</name>
<value>hcat=org.apache.oozie.action.hadoop.HCatCredentials</value>
</property>