Apache Beam 和 CombineFn 的编码器问题
Coder issues with Apache Beam and CombineFn
我们正在使用 Apache Beam 和 DirectRunner
作为 运行ner 构建管道。我们目前正在尝试一个简单的管道,由此我们:
- 从 Google 云 Pub/Sub 中提取数据(当前使用模拟器到 运行 本地)
- 反序列化为 Java 对象
- Window 事件使用固定 windows 1 分钟
- 使用自定义
CombineFn
将这些 windows 组合起来,将它们从事件转换为事件列表。
流水线代码:
pipeline
.apply(PubsubIO.<String>read().topic(options.getTopic()).withCoder(StringUtf8Coder.of()))
.apply("ParseEvent", ParDo.of(new ParseEventFn()))
.apply("WindowOneMinute",Window.<Event>into(FixedWindows.of(Duration.standardMinutes(1))))
.apply("CombineEvents", Combine.globally(new CombineEventsFn()));
ParseEvent 函数:
static class ParseEventFn extends DoFn<String, Event> {
@ProcessElement
public void processElement(ProcessContext c) {
String json = c.element();
c.output(gson.fromJson(json, Event.class));
}
}
CombineEvents 函数:
public static class CombineEventsFn extends CombineFn<Event, CombineEventsFn.Accum, EventListWrapper> {
public static class Accum {
EventListWrapper eventListWrapper = new EventListWrapper();
}
@Override
public Accum createAccumulator() {
return new Accum();
}
@Override
public Accum addInput(Accum accumulator, Event event) {
accumulator.eventListWrapper.events.add(event);
return accumulator;
}
@Override
public Accum mergeAccumulators(Iterable<Accum> accumulators) {
Accum merged = createAccumulator();
for (Accum accum : accumulators) {
merged.eventListWrapper.events.addAll(accum.eventListWrapper.events);
}
return merged;
}
@Override
public EventListWrapper extractOutput(Accum accumulator) {
return accumulator.eventListWrapper;
}
}
尝试使用 Maven 和 DirectRunner
在本地 运行 时,我们收到以下错误:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo.run(ExecJavaMojo.java:293)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Unable to return a default Coder for CombineEvents/Combine.perKey(CombineEvents)/Combine.GroupedValues/ParDo(Anonymous).out [PCollection]. Correct one of the following root causes:
No Coder has been manually specified; you may do so using .setCoder().
Inferring a Coder from the CoderRegistry failed: Unable to provide a default Coder for org.apache.beam.sdk.values.KV<K, OutputT>. Correct one of the following root causes:
Building a Coder using a registered CoderFactory failed: Cannot provide coder for parameterized type org.apache.beam.sdk.values.KV<K, OutputT>: Unable to provide a default Coder for java.lang.Object. Correct one of the following root causes:
Building a Coder using a registered CoderFactory failed: Cannot provide coder based on value with class java.lang.Object: No CoderFactory has been registered for the class.
Building a Coder from the @DefaultCoder annotation failed: Class java.lang.Object does not have a @DefaultCoder annotation.
Building a Coder from the fallback CoderProvider failed: Cannot provide coder for type java.lang.Object: org.apache.beam.sdk.coders.protobuf.ProtoCoder@6e610150 could not provide a Coder for type java.lang.Object: Cannot provide ProtoCoder because java.lang.Object is not a subclass of com.google.protobuf.Message; org.apache.beam.sdk.coders.SerializableCoder@7adc59c8 could not provide a Coder for type java.lang.Object: Cannot provide SerializableCoder because java.lang.Object does not implement Serializable.
Building a Coder from the @DefaultCoder annotation failed: Class org.apache.beam.sdk.values.KV does not have a @DefaultCoder annotation.
Using the default output Coder from the producing PTransform failed: Unable to provide a default Coder for org.apache.beam.sdk.values.KV<K, OutputT>. Correct one of the following root causes:
Building a Coder using a registered CoderFactory failed: Cannot provide coder for parameterized type org.apache.beam.sdk.values.KV<K, OutputT>: Unable to provide a default Coder for java.lang.Object. Correct one of the following root causes:
Building a Coder using a registered CoderFactory failed: Cannot provide coder based on value with class java.lang.Object: No CoderFactory has been registered for the class.
Building a Coder from the @DefaultCoder annotation failed: Class java.lang.Object does not have a @DefaultCoder annotation.
Building a Coder from the fallback CoderProvider failed: Cannot provide coder for type java.lang.Object: org.apache.beam.sdk.coders.protobuf.ProtoCoder@6e610150 could not provide a Coder for type java.lang.Object: Cannot provide ProtoCoder because java.lang.Object is not a subclass of com.google.protobuf.Message; org.apache.beam.sdk.coders.SerializableCoder@7adc59c8 could not provide a Coder for type java.lang.Object: Cannot provide SerializableCoder because java.lang.Object does not implement Serializable.
Building a Coder from the @DefaultCoder annotation failed: Class org.apache.beam.sdk.values.KV does not have a @DefaultCoder annotation.
at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
at org.apache.beam.sdk.values.TypedPValue.getCoder(TypedPValue.java:51)
at org.apache.beam.sdk.values.PCollection.getCoder(PCollection.java:130)
at org.apache.beam.sdk.values.TypedPValue.finishSpecifying(TypedPValue.java:90)
at org.apache.beam.sdk.runners.TransformHierarchy.finishSpecifyingInput(TransformHierarchy.java:143)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:418)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:334)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:154)
at org.apache.beam.sdk.transforms.Combine$Globally.expand(Combine.java:1459)
at org.apache.beam.sdk.transforms.Combine$Globally.expand(Combine.java:1336)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:420)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:350)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:167)
at ***************************.main(***************.java:231)
... 6 more
对大量代码转储表示歉意 - 想提供所有上下文。
我很好奇为什么它抱怨 java.lang.Object
和 org.apache.beam.sdk.values.KV<K, OutputT>
都没有默认编码器 - 据我所知我们的管道在 String
之间改变类型, Event
和 EventListWrapper
- 后两个 classes 在 class 本身上设置了默认编码器(在这两种情况下都是 AvroCoder
)。
错误发生在我们应用 CombineFn 的那一行 - 可以确认没有这个转换管道工作。
我怀疑我们以某种方式错误地设置了组合变换,但到目前为止,在 Beam 文档中没有发现任何东西可以为我们指明正确的方向。
如有任何见解,我们将不胜感激 - 提前致谢!
您看到 java.lang.Object
的可能原因是 Beam 正在尝试为未解析的类型变量推断编码器,该变量将被解析为 Object
。这可能是 Combine
.
中编码器推理方式的错误
另外,我预计 Accum
class 也会导致编码器推理失败。您可以覆盖 CombineFn
中的 getAccumulatorCoder
以直接提供一个。
您是否检查过将 Serializable 添加到 Accumulator 是否可以直接工作?
所以将"implements Serializable"加到累积class ...
public static class Accum implements Serializable {
EventListWrapper eventListWrapper = new EventListWrapper();
}
我们正在使用 Apache Beam 和 DirectRunner
作为 运行ner 构建管道。我们目前正在尝试一个简单的管道,由此我们:
- 从 Google 云 Pub/Sub 中提取数据(当前使用模拟器到 运行 本地)
- 反序列化为 Java 对象
- Window 事件使用固定 windows 1 分钟
- 使用自定义
CombineFn
将这些 windows 组合起来,将它们从事件转换为事件列表。
流水线代码:
pipeline
.apply(PubsubIO.<String>read().topic(options.getTopic()).withCoder(StringUtf8Coder.of()))
.apply("ParseEvent", ParDo.of(new ParseEventFn()))
.apply("WindowOneMinute",Window.<Event>into(FixedWindows.of(Duration.standardMinutes(1))))
.apply("CombineEvents", Combine.globally(new CombineEventsFn()));
ParseEvent 函数:
static class ParseEventFn extends DoFn<String, Event> {
@ProcessElement
public void processElement(ProcessContext c) {
String json = c.element();
c.output(gson.fromJson(json, Event.class));
}
}
CombineEvents 函数:
public static class CombineEventsFn extends CombineFn<Event, CombineEventsFn.Accum, EventListWrapper> {
public static class Accum {
EventListWrapper eventListWrapper = new EventListWrapper();
}
@Override
public Accum createAccumulator() {
return new Accum();
}
@Override
public Accum addInput(Accum accumulator, Event event) {
accumulator.eventListWrapper.events.add(event);
return accumulator;
}
@Override
public Accum mergeAccumulators(Iterable<Accum> accumulators) {
Accum merged = createAccumulator();
for (Accum accum : accumulators) {
merged.eventListWrapper.events.addAll(accum.eventListWrapper.events);
}
return merged;
}
@Override
public EventListWrapper extractOutput(Accum accumulator) {
return accumulator.eventListWrapper;
}
}
尝试使用 Maven 和 DirectRunner
在本地 运行 时,我们收到以下错误:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo.run(ExecJavaMojo.java:293)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Unable to return a default Coder for CombineEvents/Combine.perKey(CombineEvents)/Combine.GroupedValues/ParDo(Anonymous).out [PCollection]. Correct one of the following root causes:
No Coder has been manually specified; you may do so using .setCoder().
Inferring a Coder from the CoderRegistry failed: Unable to provide a default Coder for org.apache.beam.sdk.values.KV<K, OutputT>. Correct one of the following root causes:
Building a Coder using a registered CoderFactory failed: Cannot provide coder for parameterized type org.apache.beam.sdk.values.KV<K, OutputT>: Unable to provide a default Coder for java.lang.Object. Correct one of the following root causes:
Building a Coder using a registered CoderFactory failed: Cannot provide coder based on value with class java.lang.Object: No CoderFactory has been registered for the class.
Building a Coder from the @DefaultCoder annotation failed: Class java.lang.Object does not have a @DefaultCoder annotation.
Building a Coder from the fallback CoderProvider failed: Cannot provide coder for type java.lang.Object: org.apache.beam.sdk.coders.protobuf.ProtoCoder@6e610150 could not provide a Coder for type java.lang.Object: Cannot provide ProtoCoder because java.lang.Object is not a subclass of com.google.protobuf.Message; org.apache.beam.sdk.coders.SerializableCoder@7adc59c8 could not provide a Coder for type java.lang.Object: Cannot provide SerializableCoder because java.lang.Object does not implement Serializable.
Building a Coder from the @DefaultCoder annotation failed: Class org.apache.beam.sdk.values.KV does not have a @DefaultCoder annotation.
Using the default output Coder from the producing PTransform failed: Unable to provide a default Coder for org.apache.beam.sdk.values.KV<K, OutputT>. Correct one of the following root causes:
Building a Coder using a registered CoderFactory failed: Cannot provide coder for parameterized type org.apache.beam.sdk.values.KV<K, OutputT>: Unable to provide a default Coder for java.lang.Object. Correct one of the following root causes:
Building a Coder using a registered CoderFactory failed: Cannot provide coder based on value with class java.lang.Object: No CoderFactory has been registered for the class.
Building a Coder from the @DefaultCoder annotation failed: Class java.lang.Object does not have a @DefaultCoder annotation.
Building a Coder from the fallback CoderProvider failed: Cannot provide coder for type java.lang.Object: org.apache.beam.sdk.coders.protobuf.ProtoCoder@6e610150 could not provide a Coder for type java.lang.Object: Cannot provide ProtoCoder because java.lang.Object is not a subclass of com.google.protobuf.Message; org.apache.beam.sdk.coders.SerializableCoder@7adc59c8 could not provide a Coder for type java.lang.Object: Cannot provide SerializableCoder because java.lang.Object does not implement Serializable.
Building a Coder from the @DefaultCoder annotation failed: Class org.apache.beam.sdk.values.KV does not have a @DefaultCoder annotation.
at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
at org.apache.beam.sdk.values.TypedPValue.getCoder(TypedPValue.java:51)
at org.apache.beam.sdk.values.PCollection.getCoder(PCollection.java:130)
at org.apache.beam.sdk.values.TypedPValue.finishSpecifying(TypedPValue.java:90)
at org.apache.beam.sdk.runners.TransformHierarchy.finishSpecifyingInput(TransformHierarchy.java:143)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:418)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:334)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:154)
at org.apache.beam.sdk.transforms.Combine$Globally.expand(Combine.java:1459)
at org.apache.beam.sdk.transforms.Combine$Globally.expand(Combine.java:1336)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:420)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:350)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:167)
at ***************************.main(***************.java:231)
... 6 more
对大量代码转储表示歉意 - 想提供所有上下文。
我很好奇为什么它抱怨 java.lang.Object
和 org.apache.beam.sdk.values.KV<K, OutputT>
都没有默认编码器 - 据我所知我们的管道在 String
之间改变类型, Event
和 EventListWrapper
- 后两个 classes 在 class 本身上设置了默认编码器(在这两种情况下都是 AvroCoder
)。
错误发生在我们应用 CombineFn 的那一行 - 可以确认没有这个转换管道工作。
我怀疑我们以某种方式错误地设置了组合变换,但到目前为止,在 Beam 文档中没有发现任何东西可以为我们指明正确的方向。
如有任何见解,我们将不胜感激 - 提前致谢!
您看到 java.lang.Object
的可能原因是 Beam 正在尝试为未解析的类型变量推断编码器,该变量将被解析为 Object
。这可能是 Combine
.
另外,我预计 Accum
class 也会导致编码器推理失败。您可以覆盖 CombineFn
中的 getAccumulatorCoder
以直接提供一个。
您是否检查过将 Serializable 添加到 Accumulator 是否可以直接工作?
所以将"implements Serializable"加到累积class ...
public static class Accum implements Serializable {
EventListWrapper eventListWrapper = new EventListWrapper();
}