sqoop 导入失败，数字溢出

Question

sqoop 导入作业失败，原因是：java.sql.SQLException：数字溢出我必须加载 Oracle table，它在 Oracle 中的列类型为 NUMBER，没有缩放，并且在配置单元中转换为 DOUBLE。这是 Oracle 和 Hive 数值的最大可能大小。问题是如何克服这个错误？

Answer 1

编辑： 该答案假定您的 Oracle 数据良好，并且您的 Sqoop 作业需要特定配置来应对 NUMBER 个值。事实并非如此，请参阅替代答案。

理论上是可以解决的。

来自 Oracle 文档 关于 "Copying Oracle tables to Hadoop"（在他们的大数据设备中），部分 "Creating a Hive table" > "About datatype conversion"...

NUMBER

INT when the scale is 0 and the precision is less than 10

BIGINT when the scale is 0 and the precision is less than 19

DECIMAL when the scale is greater than 0 or the precision is greater than 19

因此您必须找出您的 Oracle table 中的实际值范围是多少，然后您将能够指定目标 Hive 列 BIGINT 或 DECIMAL(38,0) 或 DECIMAL(22,7) 或其他。

现在，从 Sqoop 文档 关于 "sqoop - import" > " =33=]"...

Sqoop is preconfigured to map most SQL types to appropriate Java or Hive representatives. However the default mapping might not be suitable for everyone and might be overridden by --map-column-java (for changing mapping to Java) or --map-column-hive (for changing Hive mapping).

Sqoop is expecting comma separated list of mappings (...) for example
$ sqoop import ... --map-column-java id=String,value=Integer

警告 #1：根据 SQOOP-2103，您需要 Sqoop V1.4.7 或更高版本 才能将该选项与 Decimal 一起使用，并且您需要到 "URL Encode" 逗号，例如DECIMAL(22,7)
--map-column-hive "wtf=Decimal(22%2C7)"

警告 #2：在您的情况下，不清楚溢出是在 将 Oracle 值读取 到 Java 变量时，还是在将 Java 变量写入HDFS 文件——甚至其他地方。所以也许 --map-column-hive 还不够。
再一次，根据 that post which points to SQOOP-1493，--map-column-java 至少在 Sqoop V1.4.7 之前不支持 Java 类型 java.math.BigDecimal（甚至不清楚它是否在该特定选项中受支持，以及是否预期为 BigDecimal 或 java.math.BigDecimal)

实际上，由于 Sqoop 1.4.7 并非在所有发行版中都可用，并且由于您的问题没有得到很好的诊断，因此可能不可行。

所以我建议通过在读取时将您的恶意 Oracle 列转换为字符串来隐藏问题。
比照。关于 "sqoop - import" > "Free-form Query Imports"...

的文档

Instead of using the --table, --columns and --where arguments, you can specify a SQL statement with the --query argument (...) Your query must include the token $CONDITIONS (...) For example:
$ sqoop import --query 'SELECT a.*, b.* FROM a JOIN b ON a.id=b.id WHERE $CONDITIONS' ...

在你的情况下，SELECT x, y, TO_CHAR(z) AS z FROM wtf 加上适当的格式 TO_CHAR 这样你就不会因为四舍五入而丢失任何信息。

Answer 2

好的，我的第一个回答假设您的 Oracle 数据很好，并且您的 Sqoop 作业需要特定的配置来处理 NUMBER 个值。

但现在我怀疑你的 Oracle 数据包含垃圾，特别是 NaN 值，因为计算错误.
参见 post 例如：

Oracle 甚至有不同的 "Not-a-Number" 类别来表示 "infinity"，使事情变得更加复杂。

但是在Java这边，BigDecimal不支持NaN——从documentation开始，在所有的转换方法中...

Throws:
NumberFormatException - if value is infinite or NaN.

请注意，JDBC 驱动程序会屏蔽该异常并改为显示 NumericOverflow，以使调试变得更加复杂...

所以你的问题看起来像那个：
——但不幸的是，SolR 允许跳过错误，而 Sqoop 不允许；所以你不能使用相同的技巧。

最后，您将不得不 "mask" 这些 NaN 值与 Oracle function NaNVL, using a free-form query in Sqoop:

$ sqoop import --query 'SELECT x, y, NANVL(z, Null) AS z FROM wtf WHERE $CONDITIONS'

sqoop 导入失败，数字溢出

sqoop import fails with numeric overflow

sqoop