Cadence 中匹配服务的问题

Question

两天前，我们开始提出节奏设置方面的一些问题。我们注意到的第一件事是 Open 工作流在完成后并没有从列表中消失。例如，此工作流在列表中显示为打开：

但是当你点击它的时候，你会看到它实际上已经完成了：

在这种情况开始发生的同时，我们注意到有几个工作流需要很长时间才能完成，其中有几个会停留在“计划”状态并且永远不会从那里走得更远。检查日志后，我们看到的唯一错误是：

{"level":"error","ts":"2021-03-06T19:12:04.865Z","msg":"Persistent store operation failure","service":"cadence-matching","component":"matching-engine","wf-task-list-name":"cadence-sys-history-scanner-tasklist-0","wf-task-list-type":1,"store-operation":"create-task","error":"InternalServiceError{Message: CreateTasks operation failed. Error : Request on table cadence.tasks with ttl of 630720000 seconds exceeds maximum supported expiration date of 2038-01-19T03:14:06+00:00. In order to avoid this use a lower TTL, change the expiration date overflow policy or upgrade to a version where this limitation is fixed. See CASSANDRA-14092 for more details.}","wf-task-list-name":"cadence-sys-history-scanner-tasklist-0","wf-task-list-type":1,"number":6300094,"next-number":6300094,"logging-call-at":"taskWriter.go:176","stacktrace":"github.com/uber/cadence/common/log/loggerimpl.(*loggerImpl).Error\n\t/cadence/common/log/loggerimpl/logger.go:134\ngithub.com/uber/cadence/service/matching.(*taskWriter).taskWriterLoop\n\t/cadence/service/matching/taskWriter.go:176"}

有人知道为什么会这样吗？

Answer 1

第一个是因为默认启用可见性采样（以保护默认核心数据库）。您可以通过 configure system.enableVisibilitySampling 将其禁用。

但是当你这样做时，最好将可见性和默认存储分离到不同的数据库集群中，这样可见性就不会降低默认（核心数据模型）数据库。

在https://github.com/uber/cadence/issues/3884

中查看更多内容

第二个是0.16.0修复的bug 升级服务器应该可以解决

见https://github.com/uber/cadence/pull/3627 和 https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/recoveringTtlYear2038Problem.html

Cadence 中匹配服务的问题

Issues with matching service in Cadence

cadence-workflow

uber-cadence