使用 Kafka IO 时 Spark Runner 的 Apache Beam 问题

Apache Beam Issue with Spark Runner while using Kafka IO

我正在尝试使用 Spark Runner 为 Apache Beam 代码测试 KafkaIO。 该代码适用于 Direct Runner。

但是,如果我在代码行下方添加它会抛出错误:

options.setRunner(SparkRunner.class);

错误:

ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 2.0 (TID 0)
java.lang.WhosebugError
    at java.base/java.io.ObjectInputStream$BlockDataInputStream.readByte(ObjectInputStream.java:3307)
    at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2135)
    at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
    at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:482)
    at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:440)
    at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:488)
    at jdk.internal.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)

我尝试使用的版本:

<beam.version>2.33.0</beam.version>
<spark.version>3.1.2</spark.version>
<kafka.version>3.0.0</kafka.version>

此问题已通过添加 VM 参数解决:-Xss2M

这个 link 帮我解决了这个问题: https://github.com/eclipse-openj9/openj9/issues/10370