在 Sqoop 中使用 HCatalog 时，hive-drop-import-delims 不删除换行符

Question

Sqoop 与 HCatalog 导入一起使用时无法从列数据中删除新行 (\n)，即使在命令中使用 --hive-drop-import-delims 选项后运行 Apache Sqoop with Oracle .

Sqoop 查询：

    sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \ 
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ 
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""

Oracle Column col4中的数据如下：（数据有控制字符如^M）

<li>Details:^M
    <ul>^M
        <li>

是否是控制字符导致了这个问题？

我错过了什么吗？这个问题有任何解决方法或解决方案吗？

Answer 1

使用 --map-column-java 选项明确声明该列的类型为 String。然后 --hive-drop-import-delims 按预期工作（从数据中删除 \n）。

更改了 Sqoop 命令：

sqoop import --connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username user123 --password passwd123 -table SCHEMA.TBL_2 \ 
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \ 
--split-by SOME_ID --columns col1,col2,col3,col4 --hive-drop-import-delims \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string "" --map-column-java col4=String

Answer 2

sqoop import \
--connect jdbc:oracle:thin:@ORA_IP:ORA_PORT:ORA_SID \
--username 123 \
--password 123 \
--table SCHEMA.TBL_2 \
--hcatalog-table tbl2 --hcatalog-database testdb --num-mappers 1 \
--split-by SOME_ID --columns col1,col2,col3,col4 \
--hive-delims-replacement "anything" \
--outdir /tmp/temp_table_loc --class-name "SqoopWithHCAT" \
--null-string ""

您可以尝试这个 --hive-delims-replacement "anything" 这将替换所有 \n 、 \t 和 \01 字符您提供的字符串（在本例中替换为字符串 "anything"）。

Answer 3

来自官网： https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

Hive will have problems using Sqoop-imported data if your database’s rows contain string fields that have Hive’s default row delimiters (\n and \r characters) or column delimiters ( characters) present in them. You can use the --hive-drop-import-delims option to drop those characters on import to give Hive-compatible text data. Alternatively, you can use the --hive-delims-replacement option to replace those characters with a user-defined string on import to give Hive-compatible text data. These options should only be used if you use Hive’s default delimiters and should not be used if different delimiters are specified.

在 Sqoop 中使用 HCatalog 时，hive-drop-import-delims 不删除换行符

hive-drop-import-delims not removing newline while using HCatalog in Sqoop

oracle

hadoop

hive

sqoop

hcatalog