将 AVRO 数据从 SQL 服务器导入 HDFS 时 Sqoop 导入失败
Sqoop Import failing while imporing AVRO data from SQL Server to HDFS
我是 AVRO 的新手,我正在尝试将 AVRO 格式的数据从 SQL 服务器导入 HDFS。
错误:org.kitesdk.data.DatasetOperationException:追加失败 {"id": "D22C2475", "create_date": "2020-08-22 14:34:06.0", "modified_date": "2020-08-22 14:34:06.0"} 到 ParquetAppender{path=job_1597813536070/mr/attempt_1597813536070_m_000000_0/.d55262cf-e49b-4378-addc-0f85698efb47.parquet.tmp">hdfs: //nameservice1/tmp/schema/.temp/job_1597813536070/mr/attempt_1597813536070_m_000000_0/.d55262cf-e49b-4378-addc-0f85698efb47.parquet.tmp, schema={"type":"record","name":"AutoGeneratedSchema", "doc":"Sqoop 导入 QueryResult","fields":[{"name":"id","type":["null","string"],"default":null,"columnName":" id","sqlType":"1"},{"name":"create_date","type":["null","long"],"default":null,"columnName":" create_date","sqlType":"93"},{"name":"modified_date","type":["null","long"],"default":null," columnName":"modified_date","sqlType":"93"}],"tableName":"QueryResult"}, fileSystem=DFS[DFSClient[clientName=DFSClient_attempt_1597813536070_m_000000_0_960843231_1, ugi=username (auth:简单)]], avroParquetWriter=parquet.avro.AvroParquetWriter@7b122839}
原因:java.lang.ClassCastException:java.lang.String 无法转换为 java.lang.Number
TABLE -
创建 TABLE “ticket”
(
id
字符串,
create_date
字符串,
modified_date
字符串)
行格式 SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
存储为输入格式
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
输出格式
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
TBL属性(
'COLUMN_STATS_ACCURATE'='true',
'avro.schema.url'='hdfs://nameservice1/user/hive/warehouse/schema.db/ticket/.metadata/schemas/1.avsc',
'kite.compression.type'='snappy');
AVRO 文件元数据 - hdfs://nameservice1/user/hive/warehouse/schema.db/ticket/.metadata/schemas/1.avsc'
{
“类型”:“记录”,
“名称”:“自动生成模式”,
“doc”:“QueryResult 的 Sqoop 导入”,
“领域”:[{
“名称”:“编号”,
“类型”:[“空”,“字符串”],
“默认”:空,
“列名”:“ID”,
“sql类型”:“1”
}, {
“姓名”:“create_date”,
“类型”:[“空”,“字符串”],
“默认”:空,
“列名”:“create_date”,
“sqlType”:“93”
}, {
“姓名”:“modified_date”,
“类型”:[“空”,“字符串”],
“默认”:空,
“列名”:“modified_date”,
“sqlType”:“93”
}],
“表名”:“查询结果”
}
我解决了这个问题。我的 AVRO 元数据文件存在一些问题。我重新创建了它并使用以下命令将其添加到 Hive table 中。
改变 table table_name 设置 serdeproperties ('avro.schema.url' = 'hdfs://user/hive/warehouse/schema.db/table_name/1.avsc');
我是 AVRO 的新手,我正在尝试将 AVRO 格式的数据从 SQL 服务器导入 HDFS。
错误:org.kitesdk.data.DatasetOperationException:追加失败 {"id": "D22C2475", "create_date": "2020-08-22 14:34:06.0", "modified_date": "2020-08-22 14:34:06.0"} 到 ParquetAppender{path=job_1597813536070/mr/attempt_1597813536070_m_000000_0/.d55262cf-e49b-4378-addc-0f85698efb47.parquet.tmp">hdfs: //nameservice1/tmp/schema/.temp/job_1597813536070/mr/attempt_1597813536070_m_000000_0/.d55262cf-e49b-4378-addc-0f85698efb47.parquet.tmp, schema={"type":"record","name":"AutoGeneratedSchema", "doc":"Sqoop 导入 QueryResult","fields":[{"name":"id","type":["null","string"],"default":null,"columnName":" id","sqlType":"1"},{"name":"create_date","type":["null","long"],"default":null,"columnName":" create_date","sqlType":"93"},{"name":"modified_date","type":["null","long"],"default":null," columnName":"modified_date","sqlType":"93"}],"tableName":"QueryResult"}, fileSystem=DFS[DFSClient[clientName=DFSClient_attempt_1597813536070_m_000000_0_960843231_1, ugi=username (auth:简单)]], avroParquetWriter=parquet.avro.AvroParquetWriter@7b122839} 原因:java.lang.ClassCastException:java.lang.String 无法转换为 java.lang.Number
TABLE -
创建 TABLE “ticket”
(
id
字符串,
create_date
字符串,
modified_date
字符串)
行格式 SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
存储为输入格式
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
输出格式
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
TBL属性(
'COLUMN_STATS_ACCURATE'='true',
'avro.schema.url'='hdfs://nameservice1/user/hive/warehouse/schema.db/ticket/.metadata/schemas/1.avsc',
'kite.compression.type'='snappy');
AVRO 文件元数据 - hdfs://nameservice1/user/hive/warehouse/schema.db/ticket/.metadata/schemas/1.avsc' { “类型”:“记录”, “名称”:“自动生成模式”, “doc”:“QueryResult 的 Sqoop 导入”, “领域”:[{ “名称”:“编号”, “类型”:[“空”,“字符串”], “默认”:空, “列名”:“ID”, “sql类型”:“1” }, { “姓名”:“create_date”, “类型”:[“空”,“字符串”], “默认”:空, “列名”:“create_date”, “sqlType”:“93” }, { “姓名”:“modified_date”, “类型”:[“空”,“字符串”], “默认”:空, “列名”:“modified_date”, “sqlType”:“93” }], “表名”:“查询结果” }
我解决了这个问题。我的 AVRO 元数据文件存在一些问题。我重新创建了它并使用以下命令将其添加到 Hive table 中。
改变 table table_name 设置 serdeproperties ('avro.schema.url' = 'hdfs://user/hive/warehouse/schema.db/table_name/1.avsc');