将 AVRO 数据从 SQL 服务器导入 HDFS 时 Sqoop 导入失败

Sqoop Import failing while imporing AVRO data from SQL Server to HDFS

我是 AVRO 的新手,我正在尝试将 AVRO 格式的数据从 SQL 服务器导入 HDFS。

错误:org.kitesdk.data.DatasetOperationException:追加失败 {"id": "D22C2475", "create_date": "2020-08-22 14:34:06.0", "modified_date": "2020-08-22 14:34:06.0"} 到 ParquetAppender{path=job_1597813536070/mr/attempt_1597813536070_m_000000_0/.d55262cf-e49b-4378-addc-0f85698efb47.parquet.tmp">hdfs: //nameservice1/tmp/schema/.temp/job_1597813536070/mr/attempt_1597813536070_m_000000_0/.d55262cf-e49b-4378-addc-0f85698efb47.parquet.tmp, schema={"type":"record","name":"AutoGeneratedSchema", "doc":"Sqoop 导入 QueryResult","fields":[{"name":"id","type":["null","string"],"default":null,"columnName":" id","sqlType":"1"},{"name":"create_date","type":["null","long"],"default":null,"columnName":" create_date","sqlType":"93"},{"name":"modified_date","type":["null","long"],"default":null," columnName":"modified_date","sqlType":"93"}],"tableName":"QueryResult"}, fileSystem=DFS[DFSClient[clientName=DFSClient_attempt_1597813536070_m_000000_0_960843231_1, ugi=username (auth:简单)]], avroParquetWriter=parquet.avro.AvroParquetWriter@7b122839} 原因:java.lang.ClassCastException:java.lang.String 无法转换为 java.lang.Number

TABLE - 创建 TABLE “ticket”( id 字符串, create_date 字符串, modified_date 字符串) 行格式 SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 存储为输入格式 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 输出格式 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' TBL属性( 'COLUMN_STATS_ACCURATE'='true', 'avro.schema.url'='hdfs://nameservice1/user/hive/warehouse/schema.db/ticket/.metadata/schemas/1.avsc', 'kite.compression.type'='snappy');

AVRO 文件元数据 - hdfs://nameservice1/user/hive/warehouse/schema.db/ticket/.metadata/schemas/1.avsc' { “类型”:“记录”, “名称”:“自动生成模式”, “doc”:“QueryResult 的 Sqoop 导入”, “领域”:[{ “名称”:“编号”, “类型”:[“空”,“字符串”], “默认”:空, “列名”:“ID”, “sql类型”:“1” }, { “姓名”:“create_date”, “类型”:[“空”,“字符串”], “默认”:空, “列名”:“create_date”, “sqlType”:“93” }, { “姓名”:“modified_date”, “类型”:[“空”,“字符串”], “默认”:空, “列名”:“modified_date”, “sqlType”:“93” }], “表名”:“查询结果” }

我解决了这个问题。我的 AVRO 元数据文件存在一些问题。我重新创建了它并使用以下命令将其添加到 Hive table 中。

改变 table table_name 设置 serdeproperties ('avro.schema.url' = 'hdfs://user/hive/warehouse/schema.db/table_name/1.avsc');