如何在 Apache Drill 中获取更多错误详细信息

How to get more error details in Apache Drill

有什么方法可以从 SQL 错误中获取更多详细信息?

就与数据相关的错误而言,drill 没有给出任何线索在哪里以及如何找到问题。 SQL 语法和逻辑错误在某种程度上是可以理解的,但想象一下这些情况:

经典例子

您有 ~10GB 的 CSV 文件,里面全是数字 (sales.csv):

ArticleId,CategoryId,Price,SupplierPrice,VAT
1234,23,15.19,12.45,0
1235,23,16.19,13.45,0.15
...
[83541670] lines
...
475,34.0,55.0,50,0.15  # This random error cause (CategoryId should be INT for this example)
...
[34767806] lines
...
[EOF]

然后考虑类似的查询:

SELECT 
 CAST (ArticleId as INT) as ArticleId,
 CAST (CategoryId as INT) as CategoryId,
 CAST (Price as DOUBLE) as Price,
 CAST (SupplierPrice as double) as SupplierPrice,
 CAST (VAT as DOUBLE) as VAT
from (...)/sales.csv

你得到了这个错误:

SYSTEM ERROR: NumberFormatException: 
Fragment 0:0
Please, refer to logs for more information.

嗯,参考日志:

[Error Id: 682cc450-61fb-4307-809a-fcb794fec692 on drill-staging-745f9968d4-m5pv7:31010]
    at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630) ~[drill-common-1.16.0.jar:1.16.0]
    at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:363) [drill-java-exec-1.16.0.jar:1.16.0]
    at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219) [drill-java-exec-1.16.0.jar:1.16.0]
    at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:329) [drill-java-exec-1.16.0.jar:1.16.0]
    at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.16.0.jar:1.16.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111-internal]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111-internal]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111-internal]
Caused by: java.lang.NumberFormatException: 
    at org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.nfeI(StringFunctionHelpers.java:96) ~[drill-java-exec-1.16.0.jar:1.16.0]
[...continues with hundred similar lines ...]

问题

问题

当 Drill 给我们这样的错误时,它会如此美丽......:

ERROR casting column "Price" to Int - Invalid integer value "34.0"

转换操作是在读取数据后应用的,因此此时 Drill 没有信息来自文件中错误数据的来源。自从 Drill 1.16 模式配置支持被引入。 Table 模式在数据读取期间应用,因此 Drill 可以生成更好的错误消息 (DATA_READ_ERROR)。有关详细信息,请参阅 https://drill.apache.org/docs/create-or-replace-schema/