Flink 的批量执行模式如何实现一个 BOUNDED source?
How to implement a BOUNDED source for Flink's batch execution mode?
我正在尝试执行 Flink (1.12.1) 批处理作业,步骤如下:
- 自定义 SourceFunction 以连接 MongoDB
- 做任何平面图和地图来转换一些数据
- 将其沉入其他 MongoDB
我试图在 StreamExecutionEnvironment 中 运行 它,使用 RuntimeExexutionMode.BATCH,但应用程序抛出异常,因为检测到我的源为 UNBOUNDED...而且我无法将它设置为 BOUNDED (它必须在收集 mongo 集合中的所有文档后完成)
异常:
exception in thread "main" java.lang.IllegalStateException: Detected an UNBOUNDED source with the 'execution.runtime-mode' set to 'BATCH'. This combination is not allowed, please set the 'execution.runtime-mode' to STREAMING or AUTOMATIC
at org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
at org.apache.flink.streaming.api.graph.StreamGraphGenerator.shouldExecuteInBatchMode(StreamGraphGenerator.java:335)
at org.apache.flink.streaming.api.graph.StreamGraphGenerator.generate(StreamGraphGenerator.java:258)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getStreamGraph(StreamExecutionEnvironment.java:1958)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getStreamGraph(StreamExecutionEnvironment.java:1943)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1782)
at com.grupotsk.bigdata.matadatapmexporter.MetadataPMExporter.main(MetadataPMExporter.java:33)
一些代码:
执行环境
public static StreamExecutionEnvironment getBatch() {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
env.addSource(new MongoSource()).print();
return env;
}
Mongo 来源:
public class MongoSource extends RichSourceFunction<Document> {
private static final long serialVersionUID = 8321722349907219802L;
private MongoClient mongoClient;
private MongoCollection mc;
@Override
public void open(Configuration con) {
mongoClient = new MongoClient(
new MongoClientURI("mongodb://localhost:27017/database"));
mc=mongoClient.getDatabase("database").getCollection("collection");
}
@Override
public void run(SourceContext<Document> ctx) throws Exception {
MongoCursor<Document> itr=mc.find(Document.class).cursor();
while(itr.hasNext())
ctx.collect(itr.next());
this.cancel();
}
@Override
public void cancel() {
mongoClient.close();
}
谢谢!
与 RuntimeExecutionMode.BATCH
一起使用的源必须实现 Source
而不是 SourceFunction
。接收器应该实现 Sink
而不是 SinkFunction
.
见Integrating Flink into your ecosystem - How to build a Flink connector from scratch for an introduction to these new interfaces. They are described in FLIP-27: Refactor Source Interface and FLIP-143: Unified Sink API。
我正在尝试执行 Flink (1.12.1) 批处理作业,步骤如下:
- 自定义 SourceFunction 以连接 MongoDB
- 做任何平面图和地图来转换一些数据
- 将其沉入其他 MongoDB
我试图在 StreamExecutionEnvironment 中 运行 它,使用 RuntimeExexutionMode.BATCH,但应用程序抛出异常,因为检测到我的源为 UNBOUNDED...而且我无法将它设置为 BOUNDED (它必须在收集 mongo 集合中的所有文档后完成)
异常:
exception in thread "main" java.lang.IllegalStateException: Detected an UNBOUNDED source with the 'execution.runtime-mode' set to 'BATCH'. This combination is not allowed, please set the 'execution.runtime-mode' to STREAMING or AUTOMATIC
at org.apache.flink.util.Preconditions.checkState(Preconditions.java:193)
at org.apache.flink.streaming.api.graph.StreamGraphGenerator.shouldExecuteInBatchMode(StreamGraphGenerator.java:335)
at org.apache.flink.streaming.api.graph.StreamGraphGenerator.generate(StreamGraphGenerator.java:258)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getStreamGraph(StreamExecutionEnvironment.java:1958)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getStreamGraph(StreamExecutionEnvironment.java:1943)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1782)
at com.grupotsk.bigdata.matadatapmexporter.MetadataPMExporter.main(MetadataPMExporter.java:33)
一些代码:
执行环境
public static StreamExecutionEnvironment getBatch() {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
env.addSource(new MongoSource()).print();
return env;
}
Mongo 来源:
public class MongoSource extends RichSourceFunction<Document> {
private static final long serialVersionUID = 8321722349907219802L;
private MongoClient mongoClient;
private MongoCollection mc;
@Override
public void open(Configuration con) {
mongoClient = new MongoClient(
new MongoClientURI("mongodb://localhost:27017/database"));
mc=mongoClient.getDatabase("database").getCollection("collection");
}
@Override
public void run(SourceContext<Document> ctx) throws Exception {
MongoCursor<Document> itr=mc.find(Document.class).cursor();
while(itr.hasNext())
ctx.collect(itr.next());
this.cancel();
}
@Override
public void cancel() {
mongoClient.close();
}
谢谢!
与 RuntimeExecutionMode.BATCH
一起使用的源必须实现 Source
而不是 SourceFunction
。接收器应该实现 Sink
而不是 SinkFunction
.
见Integrating Flink into your ecosystem - How to build a Flink connector from scratch for an introduction to these new interfaces. They are described in FLIP-27: Refactor Source Interface and FLIP-143: Unified Sink API。