Presto 因类型不匹配错误而失败

Presto fails with type mismatch errors

我遇到了以下错误,并花了几天时间试图找出 Presto 无法读取某些特定表格的原因。只是想分享将来会遇到同样错误的解决方案。

问题堆栈跟踪

org.jkiss.dbeaver.model.exec.DBCException: SQL Error [16777224]: Query failed (#20200212_074009_00007_z9eqz): The column event_timestamp is declared as type timestamp, but the Parquet file declares the column as type BINARY
        at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.nextRow(JDBCResultSetImpl.java:179)
        at org.jkiss.dbeaver.model.impl.jdbc.struct.JDBCTable.readData(JDBCTable.java:195)
        at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.lambda[=10=](ResultSetJobDataRead.java:110)
        at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:164)
        at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.run(ResultSetJobDataRead.java:108)
        at org.jkiss.dbeaver.ui.controls.resultset.ResultSetViewer.run(ResultSetViewer.java:3468)
        at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:103)
        at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
    Caused by: java.sql.SQLException: Query failed (#20200212_074009_00007_z9eqz): The column event_timestamp is declared as type timestamp, but the Parquet file declares the column as type BINARY
        at com.facebook.presto.jdbc.PrestoResultSet.resultsException(PrestoResultSet.java:1840)
        at com.facebook.presto.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:1820)
        at com.facebook.presto.jdbc.PrestoResultSet$ResultsPageIterator.computeNext(PrestoResultSet.java:1759)
        at com.facebook.presto.jdbc.internal.guava.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
        at com.facebook.presto.jdbc.internal.guava.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
        at com.facebook.presto.jdbc.internal.guava.collect.TransformedIterator.hasNext(TransformedIterator.java:42)
        at com.facebook.presto.jdbc.internal.guava.collect.Iterators$ConcatenatedIterator.getTopMetaIterator(Iterators.java:1311)
        at com.facebook.presto.jdbc.internal.guava.collect.Iterators$ConcatenatedIterator.hasNext(Iterators.java:1327)
        at com.facebook.presto.jdbc.LengthLimitedIterator.hasNext(LengthLimitedIterator.java:42)
        at com.facebook.presto.jdbc.PrestoResultSet.next(PrestoResultSet.java:144)
        at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.next(JDBCResultSetImpl.java:268)
        at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.nextRow(JDBCResultSetImpl.java:176)
        ... 7 more
    Caused by: com.facebook.presto.spi.PrestoException: The column event_timestamp is declared as type timestamp, but the Parquet file declares the column as type BINARY
        at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.getParquetType(ParquetPageSourceFactory.java:301)
        at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.getColumnType(ParquetPageSourceFactory.java:404)
        at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.lambda$createParquetPageSource(ParquetPageSourceFactory.java:185)
        at java.util.stream.ReferencePipeline.accept(ReferencePipeline.java:193)
        at java.util.stream.ReferencePipeline.accept(ReferencePipeline.java:175)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:546)
        at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:189)
        at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:139)
        at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:273)
        at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:120)
        at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:51)
        at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:58)
        at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:248)
        at com.facebook.presto.operator.Driver.processInternal(Driver.java:379)
        at com.facebook.presto.operator.Driver.lambda$processFor(Driver.java:283)
        at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675)
        at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
        at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
        at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
        at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
        at com.facebook.presto.$gen.Presto_0_228_bcf44e4____20200212_073601_1.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

(因为 Whosebug 不允许我发送它,因为它主要是代码。我需要添加一些 lorem ipsum 语句)

Ut turpis purus、vulputate vitae semper vitae、convallis 和 arcu。 Etiam Mattis Mollis Turpis Sed 前庭。

1-原因

当 parquet 的列顺序与创建语句顺序不匹配时,Presto 失败。当镶木地板文件不包含 table.

中的某些列时,有时也会发生此错误

2-解

将以下选项添加到 hive.properties 文件并重新启动 Presto:

hive.parquet.use-column-names=true

另见 Presto issue to turn use-column-names by default

希望对以后的人有所帮助!