通过 Spark runner 和 HDFS 的 Apache Beam 字数统计示例失败并显示 "Failed to serialize and deserialize property"

Apache Beam Word Count Example via Spark runner and HDFS fails with "Failed to serialize and deserialize property"

我正在尝试 运行 Spark v1 上的 Apache Beam v2.0.0 字数统计示例。6.x(通过 Yarn v2.7.3),以便它读取和写入 HDFS(v2 .7.3).

目前,我通过以下命令提交作业:

bin/spark-submit --class org.apache.beam.examples.WordCount \
  --master yarn --deploy-mode cluster \
  test/word-count-beam-1.0-SNAPSHOT.jar \
    --inputFile=hdfs://test/input/* \
    --output=hdfs://test/output \
    --runner=SparkRunner --sparkMaster=yarn

不幸的是,作业失败并出现以下异常:

Failed to serialize and deserialize property 'hdfsConfiguration' with value '[Configuration: /usr/hdp/current/hadoop-client/conf/core-site.xml, /usr/hdp/current/hadoop-client/conf/hdfs-site.xml]'

这里是完整的堆栈跟踪:

java.lang.IllegalStateException: Failed to serialize the pipeline options.
  at org.apache.beam.runners.spark.translation.SparkRuntimeContext.serializePipelineOptions(SparkRuntimeContext.java:58)
  at org.apache.beam.runners.spark.translation.SparkRuntimeContext.<init>(SparkRuntimeContext.java:41)
  at org.apache.beam.runners.spark.translation.EvaluationContext.<init>(EvaluationContext.java:67)
  at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:196)
  at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:85)
  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
  at at.tmobile.bigdata.examples.WordCount.main(WordCount.java:184)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.spark.deploy.yarn.ApplicationMaster$$anon.run(ApplicationMaster.scala:561)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unexpected IOException (of type java.io.IOException): Failed to serialize and deserialize property 'hdfsConfiguration' with value '[Configuration: /usr/hdp/current/hadoop-client/conf/core-site.xml, /usr/hdp/current/hadoop-client/conf/hdfs-site.xml]'
  at com.fasterxml.jackson.databind.JsonMappingException.fromUnexpectedIOE(JsonMappingException.java:163)
  at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2342)
  at org.apache.beam.runners.spark.translation.SparkRuntimeContext.serializePipelineOptions(SparkRuntimeContext.java:56)
  ... 12 more
Caused by: java.io.IOException: Failed to serialize and deserialize property 'hdfsConfiguration' with value '[Configuration: /usr/hdp/current/hadoop-client/conf/core-site.xml, /usr/hdp/current/hadoop-client/conf/hdfs-site.xml]'
  at org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.ensureSerializable(ProxyInvocationHandler.java:710)
  at org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:629)
  at org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.serialize(ProxyInvocationHandler.java:618)
  at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
  at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
  at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
  ... 13 more
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Conflicting property-based creators: already had [constructor for java.util.ArrayList, annotations: [null]], encountered [constructor for java.util.ArrayList, annotations: [null]]
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:266)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:241)
  at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:142)
  at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:394)
  at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3169)
  at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3062)
  at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2175)
  at org.apache.beam.sdk.options.ProxyInvocationHandler$Serializer.ensureSerializable(ProxyInvocationHandler.java:708)
  ... 18 more
Caused by: java.lang.IllegalArgumentException: Conflicting property-based creators: already had [constructor for java.util.ArrayList, annotations: [null]], encountered [constructor for java.util.ArrayList, annotations: [null]]
  at com.fasterxml.jackson.databind.deser.impl.CreatorCollector.verifyNonDup(CreatorCollector.java:228)
  at com.fasterxml.jackson.databind.deser.impl.CreatorCollector.addPropertyCreator(CreatorCollector.java:168)
  at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory._handleSingleArgumentConstructor(BasicDeserializerFactory.java:487)
  at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory._addDeserializerConstructors(BasicDeserializerFactory.java:406)
  at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory._constructDefaultValueInstantiator(BasicDeserializerFactory.java:325)
  at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.findValueInstantiator(BasicDeserializerFactory.java:266)
  at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.createCollectionDeserializer(BasicDeserializerFactory.java:851)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:390)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:348)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:261)
  ... 25 more

有人知道如何解决这个问题吗?

我遇到了同样的问题。

java.util.ServiceLoader.load(com.fasterxml.jackson.databind.‌​Module.class)中加载的模块是:

问题出在 dfsConfiguration 属性 类型 ArrayList<Configuration>

spark runner 配置文件的 jackson-module-scala 依赖项中排除 paranamer 依赖项有助于:

 <profiles>
     <profile>
        <id>spark-runner</id>
        <dependencies>
            ...
            <dependency>
                <groupId>com.fasterxml.jackson.module</groupId>
                <artifactId>jackson-module-scala_2.10</artifactId>
                <version>2.8.8</version>
                <scope>runtime</scope>
                <exclusions>
                    <exclusion>
                        <groupId>com.fasterxml.jackson.module</groupId>
                        <artifactId>jackson-module-paranamer</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
            ...
        </dependencies>
    </profile>
</profiles>

ParanamerModule 检查 属性 注释并且对 ArrayList 构造函数失败,但它是可选的。