Flink 拆分流水线
Flink split pipeline
如果管道中有 execute_insert,为什么 flink 将管道拆分为多个作业?
docker-compose exec jobmanager ./bin/flink run --pyModule my.main -d --pyFiles /opt/pyflink/ -d
Job has been submitted with JobID 3b0e179dad500a362525f23e82e2c826
Job has been submitted with JobID 93d122a6331b4b9ec2578fe67e748a8e
管道结束:
t_env.execute_sql("""
CREATE TABLE mySink (
id STRING,
name STRING,
data_ranges ARRAY<ROW<start BIGINT, end BIGINT>>,
meta ARRAY<ROW<name STRING, text STRING>>,
current_hour INT
) partitioned by(current_hour) WITH (
'connector' = 'filesystem',
'format' = 'avro',
'path' = '/opt/pyflink-walkthrough/output/table',
'sink.rolling-policy.rollover-interval' = '1 hour',
'partition.time-extractor.timestamp-pattern'='$current_hour',
'sink.partition-commit.delay'='1 hour',
'sink.partition-commit.trigger'='process-time',
'sink.partition-commit.policy.kind'='success-file'
)
""")
table = t_env.from_data_stream(
ds,
ds_schema,
).select('id, name, data_ranges, meta, current_hour').execute_insert("mySink")
如果我注释掉 .execute_insert(“我的接收者”),作业将不会拆分。
docker-compose exec jobmanager ./bin/flink run --pyModule eywa.main -d --pyFiles /opt/pyflink/ -d
Job has been submitted with JobID 814a105559b58d5f65e4de8ca8c0688e
execution behavior 上的文档部分对此进行了解释。简而言之,如果将它们包装在 语句集 中,则可以将当前独立的管道组合成一个作业。请注意,如果您这样做,那么这些管道将被联合规划和优化。
如果管道中有 execute_insert,为什么 flink 将管道拆分为多个作业?
docker-compose exec jobmanager ./bin/flink run --pyModule my.main -d --pyFiles /opt/pyflink/ -d
Job has been submitted with JobID 3b0e179dad500a362525f23e82e2c826
Job has been submitted with JobID 93d122a6331b4b9ec2578fe67e748a8e
管道结束:
t_env.execute_sql("""
CREATE TABLE mySink (
id STRING,
name STRING,
data_ranges ARRAY<ROW<start BIGINT, end BIGINT>>,
meta ARRAY<ROW<name STRING, text STRING>>,
current_hour INT
) partitioned by(current_hour) WITH (
'connector' = 'filesystem',
'format' = 'avro',
'path' = '/opt/pyflink-walkthrough/output/table',
'sink.rolling-policy.rollover-interval' = '1 hour',
'partition.time-extractor.timestamp-pattern'='$current_hour',
'sink.partition-commit.delay'='1 hour',
'sink.partition-commit.trigger'='process-time',
'sink.partition-commit.policy.kind'='success-file'
)
""")
table = t_env.from_data_stream(
ds,
ds_schema,
).select('id, name, data_ranges, meta, current_hour').execute_insert("mySink")
如果我注释掉 .execute_insert(“我的接收者”),作业将不会拆分。
docker-compose exec jobmanager ./bin/flink run --pyModule eywa.main -d --pyFiles /opt/pyflink/ -d
Job has been submitted with JobID 814a105559b58d5f65e4de8ca8c0688e
execution behavior 上的文档部分对此进行了解释。简而言之,如果将它们包装在 语句集 中,则可以将当前独立的管道组合成一个作业。请注意,如果您这样做,那么这些管道将被联合规划和优化。