使用 Sqoop 将 MySQL 导入 Hive 时如何指定字段分隔符？

Question

我尝试使用 Sqoop v1.4

将 MySQL table 导入 Hive

sqoop import --connect jdbc:mysqll//localhost:3306/mysqldb \
--username user --password pwd --table mysqltbl \
--hive-import --hive-overwrite \
--hive-table hivedb.hivetbl -m 1 \
--null-string '\N' \
--null-non-string '\N' \

mysqltbl 中有 100 行，其中一个字段 text 包含 \t 和 \n，导致 Sqoop 错误地解析数据，即有更多hivetbl 中超过 100 行且字段未对齐。

Sqoop中如何指定字段和记录的分隔符而不是转义MySQL中的特殊字符？

Answer 1

您正在使用 --hive-import，它将为您创建配置单元 table IF NOT EXISTS。它将使用 Hive 的默认分隔符创建 - 字段以 CTRL A 终止，行以 \n

终止

按照练习 docs:

Even though Hive supports escaping characters, it does not handle escaping of new-line character.

Hive will have problems using Sqoop-imported data if your database’s rows contain string fields that have Hive’s default row delimiters (\n and \r characters) or column delimiters (</code> characters) present in them. You can use the <code>--hive-drop-import-delims option to drop those characters on import to give Hive-compatible text data. Alternatively, you can use the --hive-delims-replacement option to replace those characters with a user-defined string on import to give Hive-compatible text data.

您可以在查询中简单地使用 --hive-drop-import-delims，它将删除 \n。

sqoop import --connect jdbc:mysqll//localhost:3306/mysqldb \
--username user --password pwd --table mysqltbl \
--hive-import --hive-overwrite \
--hive-table hivedb.hivetbl -m 1 \
--hive-drop-import-delims \
--null-string '\N' \
--null-non-string '\N' \

如果你想替换你自己的字符串（比如space即“”），你可以使用--hive-delims-replacement.

sqoop import --connect jdbc:mysqll//localhost:3306/mysqldb \
--username user --password pwd --table mysqltbl \
--hive-import --hive-overwrite \
--hive-table hivedb.hivetbl -m 1 \
--hive-delims-replacement " " \
--null-string '\N' \
--null-non-string '\N' \

使用 Sqoop 将 MySQL 导入 Hive 时如何指定字段分隔符？

How to specify fields delimiter when import MySQL into Hive with Sqoop?

mysql

hadoop

hive

sqoop