从 oozie 导入 Sqoop 配置单元

Sqoop hive import from oozie

我正在使用 Cloudera 快速入门 Docker 图片

快速入门映像中安装了 mysql。当我从命令行使用以下 sqoop 命令导入类别 table 时,它起作用了,我可以看到类别 table 已创建

sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera -m 1 --table categories --hive-import --hive-overwrite

然后我以 cloudera 用户身份登录 Hue,我确实创建了一个新的 oozie 工作流和单个 sqoop 任务,但是当我尝试执行该 sqoop 时,它能够将数据下载到 HDFS,但是当它尝试创建配置单元时table 最重要的是它失败了

这就是我的 workflow.xml 的样子

<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5" xmlns:sla="uri:oozie:sla:0.2">
    <start to="sqoop-4467"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="sqoop-4467">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <command>import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera -m 1 --table categories --hive-import --hive-overwrite
</command>
        </sqoop>
        <ok to="End"/>
        <error to="Kill"/>
          <sla:info>
            <sla:nominal-time>${nominal_time}</sla:nominal-time>
            <sla:should-end>${30 * MINUTES}</sla:should-end>
          </sla:info>
    </action>
    <end name="End"/>
</workflow-app>

这是我的 job.properties 文件的样子

oozie.use.system.libpath=True
security_enabled=False
dryrun=False
nameNode=hdfs://quickstart.cloudera:8020
nominal_time=2016-12-20T20:53Z
jobTracker=quickstart.cloudera:8032

作业失败后,当我检查 /user/home/cloudera 文件夹时,我可以看到包含数据的类别文件夹,但我没有看到正在创建的配置单元 table。这是我在失败作业的作业历史记录服务器中看到的错误

Sqoop command arguments :
             import
             --connect
             jdbc:mysql://localhost/retail_db
             --username
             root
             --password
             cloudera
             -m
             1
             --table
             categories
             --hive-import
             --hive-overwrite
Fetching child yarn jobs
tag id : oozie-3ff81b7743470e73dcb44de6729a66d9
Child yarn jobs are found - 
=================================================================

>>> Invoking Sqoop command line now >>>

6223 [uber-SubtaskRunner] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
6302 [uber-SubtaskRunner] INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.6-cdh5.7.0
6336 [uber-SubtaskRunner] WARN  org.apache.sqoop.tool.BaseSqoopTool  - Setting your password on the command-line is insecure. Consider using -P instead.
6336 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.BaseSqoopTool  - Using Hive-specific delimiters for output. You can override
6336 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.BaseSqoopTool  - delimiters with --fields-terminated-by, etc.
6367 [uber-SubtaskRunner] WARN  org.apache.sqoop.ConnFactory  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
6654 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.MySQLManager  - Preparing to use a MySQL streaming resultset.
6666 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.CodeGenTool  - Beginning code generation
7250 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
7279 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
7281 [uber-SubtaskRunner] INFO  org.apache.sqoop.orm.CompilationManager  - HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
9303 [uber-SubtaskRunner] INFO  org.apache.sqoop.orm.CompilationManager  - Writing jar file: /tmp/sqoop-yarn/compile/4fd8773510dfe4082d136b2ab7d27eb3/categories.jar
9314 [uber-SubtaskRunner] WARN  org.apache.sqoop.manager.MySQLManager  - It looks like you are importing from mysql.
9314 [uber-SubtaskRunner] WARN  org.apache.sqoop.manager.MySQLManager  - This transfer can be faster! Use the --direct
9314 [uber-SubtaskRunner] WARN  org.apache.sqoop.manager.MySQLManager  - option to exercise a MySQL-specific fast path.
9314 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.MySQLManager  - Setting zero DATETIME behavior to convertToNull (mysql)
9318 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Beginning import of categories
9388 [uber-SubtaskRunner] WARN  org.apache.sqoop.mapreduce.JobBase  - SQOOP_HOME is unset. May not be able to find all job dependencies.
10238 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.db.DBInputFormat  - Using read commited transaction isolation
29055 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Transferred 1.0049 KB in 19.659 seconds (52.3425 bytes/sec)
29061 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Retrieved 58 records.
29076 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
29097 [uber-SubtaskRunner] INFO  org.apache.sqoop.hive.HiveImport  - Loading uploaded data into Hive
Intercepting System.exit(1)

<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000012-161221020706124-oozie-oozi-W/sqoop-4467--sqoop/action-data.seq

Oozie Launcher ends

 

您是否将 hive-site.xml 复制到 HDFS?这样就可以了,或者您可以使用 --target-dir 将 table 导入到 hdfs 路径并设置 hive [=13] 的位置=]指向那个路径