Apache Flink:Table API 状态是否可扩展?

Apache Flink: Is Table API state scalable?

根据 Flink Table API Streaming Concepts,Table API 和 SQL 查询可能会因状态大小增长而失败。

State Size: Continuous queries are evaluated on unbounded streams and are often supposed to run for weeks or months. Hence, the total amount of data that a continuous query processes can be very large. Queries that have to update previously emitted results need to maintain all emitted rows in order to be able to update them. For instance, the first example query needs to store the URL count for each user to be able to increase the count and sent out a new result when the input table receives a new row. If only registered users are tracked, the number of counts to maintain might not be too high. However, if non-registered users get a unique user name assigned, the number of counts to maintain would grow over time and might eventually cause the query to fail.

Table API 和 SQL 在后台使用数据流 API。

Table API / SQL 查询的状态不应该像 DataStream API 作业的状态一样扩展吗?

您认为 Flink 的 Table API 与 DataStream API 一样具有可扩展性,这是正确的。尽管如此,任何给定的基础设施的容量都是有限的,并且使用无限状态编写的 Flink 作业一旦耗尽了所有可用资源,最终将崩溃。一些 Flink 用户每天处理 PB 级数据,并期望他们的工作 运行 持续数周和数月,只有关注这些问题才能实现。