Confluent Kafka S3 sink connector throws `java.lang.NoClassDefFoundError: com/google/common/base/Preconditions` when using Parquet format
Confluent Kafka S3 sink connector throws `java.lang.NoClassDefFoundError: com/google/common/base/Preconditions` when using Parquet format
使用 Confluent S3 接收器连接器时,出现以下错误:
[2021-08-08 02:25:15,588] ERROR WorkerSinkTask{id=s3-test-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover unt
il manually restarted. Error: com/google/common/base/Preconditions (org.apache.kafka.connect.runtime.WorkerSinkTask:607)
java.lang.NoClassDefFoundError: com/google/common/base/Preconditions
at org.apache.hadoop.conf.Configuration$DeprecationDelta.<init>(Configuration.java:379)
at org.apache.hadoop.conf.Configuration$DeprecationDelta.<init>(Configuration.java:392)
at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:474)
at org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(ParquetWriter.java:345)
at org.apache.parquet.avro.AvroParquetWriter$Builder.<init>(AvroParquetWriter.java:162)
at org.apache.parquet.avro.AvroParquetWriter$Builder.<init>(AvroParquetWriter.java:153)
at org.apache.parquet.avro.AvroParquetWriter.builder(AvroParquetWriter.java:43)
at io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider.write(ParquetRecordWriterProvider.java:79)
at io.confluent.connect.s3.format.KeyValueHeaderRecordWriterProvider.write(KeyValueHeaderRecordWriterProvider.java:105)
at io.confluent.connect.s3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:532)
at io.confluent.connect.s3.TopicPartitionWriter.checkRotationOrAppend(TopicPartitionWriter.java:302)
at io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:245) at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:196)
at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:234) at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:329)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:182)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:231)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
这发生在 5.5、10.0.0 和 10.0.1 上。
它只发生在 Parquet 上,而 Arvo 工作正常。
日志显示分区程序和源数据格式工作正常。
[2021-08-08 02:25:15,564] INFO Opening record writer for: xxxxx/xxxxx.xxxxx.users/year=2021/month=08/day=07/xxxxx.xxxxx.tablename+0+0000000000.snappy.parquet
(io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider:74)
连接器是从 Confluent 网站手动下载的。
事实证明 hadoop-common
需要来自 Google 的 guava
实用程序,它在发行版中不知何故丢失了。
您需要找到 corresponding guava.jar
in hadoop-common
Maven repo page。然后手动下载guava.jar
到连接器的lib/
文件夹中。
似乎有一个 explicitly excluded guava
from hadoop-common
依赖项导致了这个问题:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
</exclusion>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
<exclusion>
这真应该在测试中被发现。
使用 Confluent S3 接收器连接器时,出现以下错误:
[2021-08-08 02:25:15,588] ERROR WorkerSinkTask{id=s3-test-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover unt
il manually restarted. Error: com/google/common/base/Preconditions (org.apache.kafka.connect.runtime.WorkerSinkTask:607)
java.lang.NoClassDefFoundError: com/google/common/base/Preconditions
at org.apache.hadoop.conf.Configuration$DeprecationDelta.<init>(Configuration.java:379)
at org.apache.hadoop.conf.Configuration$DeprecationDelta.<init>(Configuration.java:392)
at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:474)
at org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(ParquetWriter.java:345)
at org.apache.parquet.avro.AvroParquetWriter$Builder.<init>(AvroParquetWriter.java:162)
at org.apache.parquet.avro.AvroParquetWriter$Builder.<init>(AvroParquetWriter.java:153)
at org.apache.parquet.avro.AvroParquetWriter.builder(AvroParquetWriter.java:43)
at io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider.write(ParquetRecordWriterProvider.java:79)
at io.confluent.connect.s3.format.KeyValueHeaderRecordWriterProvider.write(KeyValueHeaderRecordWriterProvider.java:105)
at io.confluent.connect.s3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:532)
at io.confluent.connect.s3.TopicPartitionWriter.checkRotationOrAppend(TopicPartitionWriter.java:302)
at io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:245) at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:196)
at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:234) at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:329)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:182)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:231)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
这发生在 5.5、10.0.0 和 10.0.1 上。
它只发生在 Parquet 上,而 Arvo 工作正常。
日志显示分区程序和源数据格式工作正常。
[2021-08-08 02:25:15,564] INFO Opening record writer for: xxxxx/xxxxx.xxxxx.users/year=2021/month=08/day=07/xxxxx.xxxxx.tablename+0+0000000000.snappy.parquet
(io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider:74)
连接器是从 Confluent 网站手动下载的。
事实证明 hadoop-common
需要来自 Google 的 guava
实用程序,它在发行版中不知何故丢失了。
您需要找到 corresponding guava.jar
in hadoop-common
Maven repo page。然后手动下载guava.jar
到连接器的lib/
文件夹中。
似乎有一个 explicitly excluded guava
from hadoop-common
依赖项导致了这个问题:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
</exclusion>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
<exclusion>
这真应该在测试中被发现。