无法将配置单元 table 导出到 mysql

Question

我正在尝试将配置单元 table 导出到 mysql 数据库，其数据以制表符分隔存储在 HDFS 中，但每次在映射器阶段后作业都失败。

我参考了许多 link 和资源，并交叉检查了我的导出命令，如导出目录、table 名称和其他因素。 table 的模式也相同，但仍然不知道为什么作业每次都失败。

配置单元中的架构：


display_id int
employment_type string
edu_qualification string
marital_status string
job_type string
working_hours_per_week int
country string
salary string

Mysql

中的架构


display_id  int 
employment_type varchar(100)    
edu_qualification   varchar(100)    
marital_status  varchar(100)
job_type    varchar(100)    
working_hours_per_week  int 
country varchar(100)    
salary  varchar(100)

用于导出的命令table

sqoop export \
--connect <<jdbcURL>> \
--username root \ 
--password **** \
--table census_table \
--export-dir <<hdfs_dir>> \ 
--input-fields-terminated-by '\t' \
--columns "display_id,employment_type,edu_qualification,marital_status,job_type,working_hours_per_week,country,salary" \
--num-mappers 1

文件中的示例数据

39  StateGov    BachelorDegree  Unmarried   Clerical    45  US  <=55K

导出操作日志：


Warning: HBASE_HOME and HBASE_VERSION not set.
Warning: HCAT_HOME not set
Warning: HCATALOG_HOME does not exist HCatalog imports will fail.
Please set HCATALOG_HOME to the root of your HCatalog installation.
Warning: ACCUMULO_HOME not set.
Warning: ZOOKEEPER_HOME not set.
Warning: HBASE_HOME does not exist HBase imports will fail.
Please set HBASE_HOME to the root of your HBase installation.
Warning: ACCUMULO_HOME does not exist Accumulo imports will fail.
Please set ACCUMULO_HOME to the root of your Accumulo installation.
Warning: ZOOKEEPER_HOME does not exist Accumulo imports will fail.
Please set ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/04/24 12:03:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/04/24 12:03:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/04/24 12:03:59 WARN sqoop.SqoopOptions: Character argument '\t' has multiple characters; only the first will be used.
20/04/24 12:04:00 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/04/24 12:04:00 INFO tool.CodeGenTool: Beginning code generation
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual l
20/04/24 12:04:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `census_table` AS t LIMIT 1
20/04/24 12:04:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `census_table` AS t LIMIT 1
20/04/24 12:04:01 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is C:\hadoop-2.10.0
Note: \tmp\sqoop-Anand\compile87413e3b916524daa02124c4829b87\census_table.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
20/04/24 12:04:08 INFO orm.CompilationManager: Writing jar file: \tmp\sqoop-Anand\compile87413e3b916524daa02124c4829b87\census_table.jar
20/04/24 12:04:08 INFO mapreduce.ExportJobBase: Beginning export of census_table
20/04/24 12:04:08 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
20/04/24 12:04:10 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
20/04/24 12:04:10 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
20/04/24 12:04:10 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
20/04/24 12:04:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/04/24 12:04:23 INFO input.FileInputFormat: Total input files to process : 1
20/04/24 12:04:23 INFO input.FileInputFormat: Total input files to process : 1
20/04/24 12:04:24 INFO mapreduce.JobSubmitter: number of splits:1
20/04/24 12:04:25 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
20/04/24 12:04:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1587709503815_0002
20/04/24 12:04:25 INFO conf.Configuration: resource-types.xml not found
20/04/24 12:04:25 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
20/04/24 12:04:25 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
20/04/24 12:04:25 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
20/04/24 12:04:26 INFO impl.YarnClientImpl: Submitted application application_1587709503815_0002
20/04/24 12:04:26 INFO mapreduce.Job: The url to track the job: http://Watson:8088/proxy/application_1587709503815_0002/
20/04/24 12:04:26 INFO mapreduce.Job: Running job: job_1587709503815_0002
20/04/24 12:04:38 INFO mapreduce.Job: Job job_1587709503815_0002 running in uber mode : false
20/04/24 12:04:38 INFO mapreduce.Job:  map 0% reduce 0%
20/04/24 12:04:47 INFO mapreduce.Job:  map 100% reduce 0%
20/04/24 12:04:48 INFO mapreduce.Job: Job job_1587709503815_0002 failed with state FAILED due to: Task failed task_1587709503815_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

20/04/24 12:04:48 INFO mapreduce.Job: Counters: 8
        Job Counters
                Failed map tasks=1
                Launched map tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=7232
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=7232
                Total vcore-milliseconds taken by all map tasks=7232
                Total megabyte-milliseconds taken by all map tasks=7405568
20/04/24 12:04:48 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
20/04/24 12:04:48 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 38.2364 seconds (0 bytes/sec)
20/04/24 12:04:48 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
20/04/24 12:04:48 INFO mapreduce.ExportJobBase: Exported 0 records.
20/04/24 12:04:48 ERROR mapreduce.ExportJobBase: Export job failed!
20/04/24 12:04:48 ERROR tool.ExportTool: Error during export:
Export job failed!
        at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:445)
        at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
        at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
        at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

hadoop 应用程序跟踪器中的错误日志


2020-04-24 16:06:00,662 FATAL [IPC Server handler 11 on default port 51507] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1587719571504_0006_m_000000_0 - exited : java.io.IOException: Can't export data, please check failed map task logs
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:122)
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:177)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: java.lang.RuntimeException: Can't parse input data: '39  StateGov    BachelorDegree  Unmarried   Clerical    45  US  <=55K'
    at census_table.__loadFromFields(census_table.java:593)
    at census_table.parse(census_table.java:474)
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:89)
    ... 10 more
Caused by: java.lang.NumberFormatException: For input string: "39   StateGov    BachelorDegree  Unmarried   Clerical    45  US  <=55K"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.valueOf(Integer.java:766)
    at census_table.__loadFromFields(census_table.java:527)
    ... 12 more

我已经尝试了几乎所有可能的导出命令组合，但它没有完成工作，现在我不知道如何继续以及我做错了什么。请帮助或提出更改建议。

注意：我能够从 mysql 中导入 table，这表明我用于 --connnect 参数的用户名、密码和 jdbcURL 也已验证并且有效。

谢谢

Answer 1

失败的原因有很多，请按照此link跟踪日志以查看进程失败的原因

20/04/24 12:04:26 INFO mapreduce.Job: The url to track the job: http://Watson:8088/proxy/application_1587709503815_0002/

然后你会看到这样的东西

在日志中点击，您应该能够看到

并且您可以探索进程失败的原因。

无法将配置单元 table 导出到 mysql

Unable to export hive table to mysql

mysql

hadoop

hive

sqoop