Sqoop 导入为 Avro 错误

Sqoop import as Avro error

堆栈:使用 Ambari 2.1 安装 HDP-2.3.2.0-2950

我正在尝试将 sql 服务器 table 导入 HDFS。

[sqoop@l1038lab root]$ sqoop import --connect 'jdbc:sqlserver://dbserver;database=dbname' --username someusername --password somepassword --as-avrodatafile  --table DimSampleDesc --warehouse-dir /dataload/tohdfs/reio/odpdw/may2016 --verbose

输出中有一个错误:

Writing Avro schema file: /tmp/sqoop-sqoop/compile/bbbd98974f09b50a9335cedde30f73a5/DimSampleDesc.avsc
16/05/09 13:09:00 DEBUG mapreduce.DataDrivenImportJob: Could not move Avro schema file to code output directory.
java.io.FileNotFoundException: Destination directory '.' does not exist [createDestDir=true]
        at org.apache.commons.io.FileUtils.moveFileToDirectory(FileUtils.java:2865)
        at org.apache.sqoop.mapreduce.DataDrivenImportJob.writeAvroSchema(DataDrivenImportJob.java:146)
        at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:92)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
        at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:163)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:244)

/tmp/sqoop-sqoop/compile/bbbd98974f09b50a9335cedde30f73a5/的内容:

[sqoop@l1038lab root]$ ls -lrt /tmp/sqoop-sqoop/compile/bbbd98974f09b50a9335cedde30f73a5/
total 104
-rw-r--r--. 1 sqoop hadoop 61005 May  9 13:08 DimSampleDesc.java
-rw-r--r--. 1 sqoop hadoop 28540 May  9 13:08 DimSampleDesc.class
-rw-r--r--. 1 sqoop hadoop  9568 May  9 13:08 DimSampleDesc.jar
-rw-r--r--. 1 sqoop hadoop  3659 May  9 13:09 DimSampleDesc.avsc

仓库目录的内容:

[sqoop@l1038lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016
Found 1 items
drwxr-xr-x   - sqoop hdfs          0 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc
[sqoop@l1038lab root]$
[sqoop@l1038lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc
Found 7 items
-rw-r--r--   3 sqoop hdfs          0 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/_SUCCESS
-rw-r--r--   3 sqoop hdfs       2660 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00000.avro
-rw-r--r--   3 sqoop hdfs    5039870 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00001.avro
-rw-r--r--   3 sqoop hdfs    1437143 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00002.avro
-rw-r--r--   3 sqoop hdfs    1486327 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00003.avro
-rw-r--r--   3 sqoop hdfs     595550 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00004.avro
-rw-r--r--   3 sqoop hdfs       4792 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00005.avro
[sqoop@l1038lab root]$
[sqoop@l1038lab root]$

然后我手动复制了 avsc 和其他文件。

[sqoop@l1038lab root]$ hadoop fs -copyFromLocal /tmp/sqoop-sqoop/compile/d039c1b0b2a2b224d65943df1de34cdd/* /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/

现在所有文件都集中在一个地方:

[sqoop@l1038lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/
Found 11 items
-rw-rw-rw-   3 sqoop hdfs       3659 2016-05-09 13:49 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
-rw-rw-rw-   3 sqoop hdfs      28540 2016-05-09 13:49 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.class
-rw-rw-rw-   3 sqoop hdfs       9568 2016-05-09 13:49 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.jar
-rw-rw-rw-   3 sqoop hdfs      61005 2016-05-09 13:49 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.java
-rw-rw-rw-   3 sqoop hdfs          0 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/_SUCCESS
-rw-rw-rw-   3 sqoop hdfs       2660 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00000.avro
-rw-rw-rw-   3 sqoop hdfs    5039870 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00001.avro
-rw-rw-rw-   3 sqoop hdfs    1437143 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00002.avro
-rw-rw-rw-   3 sqoop hdfs    1486327 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00003.avro
-rw-rw-rw-   3 sqoop hdfs     595550 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00004.avro
-rw-rw-rw-   3 sqoop hdfs       4792 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00005.avro

现在我创建了 Hive table 并描述了它:

CREATE EXTERNAL TABLE DimSampleDesc  ROW FORMAT SERDE  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'  STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'  TBLPROPERTIES (    'avro.schema.url'='hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc');
OK
Time taken: 0.166 seconds
hive>
hive>
    > describe formatted DimSampleDesc;
OK
# col_name              data_type               comment


smapiname_ver           string
smapicolname            string
charttype               int
x_indexet               int
y_indexet               int
x_tick                  string
y_tick                  string
x_tickrange             string
x_tickrangefrom         string
x_tickrangetom          string
y_tickrange             string
y_tickrangefrom         string
y_tickrangetom          string
indexcount              int
x_indexcount            int
y_indexcount            int
x_symbol                string
x_symbolname            string
x_symboldescr           string
y_symbol                string
y_symbolname            string
y_symboldescr           string
smapiname               string
incorrect_ver_fl        boolean


# Detailed Table Information
Database:               odp_dw_may2016
Owner:                  hive
CreateTime:             Mon May 09 14:46:40 CEST 2016
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://l1031lab.sss.se.com:8020/apps/hive/warehouse/odp_dw_may2016.db/dimsampledesc
Table Type:             EXTERNAL_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE   false
        EXTERNAL                TRUE
        avro.schema.url         hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
        numFiles                0
        numRows                 -1
        rawDataSize             -1
        totalSize               0
        transient_lastDdlTime   1462798000


# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.avro.AvroSerDe
InputFormat:            org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
Time taken: 0.416 seconds, Fetched: 56 row(s)
hive>
    >

但是没有找到数据:

hive>
    >
    > select * from DimSampleDesc;
OK
Time taken: 0.098 seconds
hive>

架构文件:

[sqoop@l1038lab root]$ hadoop fs -cat /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc                                                                                   {
  "type" : "record",
  "name" : "DimSampleDesc",
  "doc" : "Sqoop import of DimSampleDesc",
  "fields" : [ {
    "name" : "SmapiName_ver",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "SmapiName_ver",
    "sqlType" : "12"
  }, {
    "name" : "SmapiColName",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "SmapiColName",
    "sqlType" : "12"
  }, {
    "name" : "ChartType",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "ChartType",
    "sqlType" : "4"
  }, {
    "name" : "X_Indexet",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "X_Indexet",
    "sqlType" : "4"
  }, {
    "name" : "Y_Indexet",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "Y_Indexet",
    "sqlType" : "4"
  }, {
    "name" : "X_Tick",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_Tick",
    "sqlType" : "-9"
  }, {
    "name" : "Y_Tick",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_Tick",
    "sqlType" : "-9"
  }, {
    "name" : "X_TickRange",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_TickRange",
    "sqlType" : "-9"
  }, {
    "name" : "X_TickRangeFrom",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_TickRangeFrom",
    "sqlType" : "-9"
  }, {
    "name" : "X_TickRangeTom",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_TickRangeTom",
    "sqlType" : "-9"
  }, {
    "name" : "Y_TickRange",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_TickRange",
    "sqlType" : "-9"
  }, {
    "name" : "Y_TickRangeFrom",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_TickRangeFrom",
    "sqlType" : "-9"
  }, {
    "name" : "Y_TickRangeTom",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_TickRangeTom",
    "sqlType" : "-9"
  }, {
    "name" : "IndexCount",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "IndexCount",
    "sqlType" : "4"
  }, {
    "name" : "X_IndexCount",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "X_IndexCount",
    "sqlType" : "4"
  }, {
    "name" : "Y_IndexCount",
    "type" : [ "null", "int" ],
    "default" : null,
    "columnName" : "Y_IndexCount",
    "sqlType" : "4"
  }, {
    "name" : "X_Symbol",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_Symbol",
    "sqlType" : "-9"
  }, {
    "name" : "X_SymbolName",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_SymbolName",
    "sqlType" : "-9"
  }, {
    "name" : "X_SymbolDescr",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "X_SymbolDescr",
    "sqlType" : "-9"
  }, {
    "name" : "Y_Symbol",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_Symbol",
    "sqlType" : "-9"
  }, {
    "name" : "Y_SymbolName",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_SymbolName",
    "sqlType" : "-9"
  }, {
    "name" : "Y_SymbolDescr",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "Y_SymbolDescr",
    "sqlType" : "-9"
  }, {
    "name" : "SmapiName",
    "type" : [ "null", "string" ],
    "default" : null,
    "columnName" : "SmapiName",
    "sqlType" : "12"
  }, {
    "name" : "Incorrect_Ver_FL",
    "type" : [ "null", "boolean" ],
    "default" : null,
    "columnName" : "Incorrect_Ver_FL",
    "sqlType" : "-7"
  } ],
  "tableName" : "DimSampleDesc"
}[sqoop@l1038lab root]$
[sqoop@l1038lab root]$

根本原因是什么,我该如何处理?

使用 sqoop 时获得的相同 avroschema 文件在配置单元中在其之上创建 tables。您可以使用 avrotools.jar 来做到这一点。

检查 table 在 SQL 服务器中是否有相同的数据。

在您的 sqoop 导入中将 --warehouse-dir 更改为 --target-dir。

您已将所有 java、jar 和 avro 文件复制到同一文件夹。 Hive 会喜欢在同一个文件夹中有不同类型的文件。