如何管理旧沙箱（并在作业不再运行后清理它们）？

Question

我是 mesos/marathon 的新手，我有一个由 5 个 mesos 从节点和一个主节点组成的集群。作业被放置到 mesos slaves 中，space in /var/lib/mesos/slaves/../executors 在任务失败时开始增加并尝试一次又一次地部署它。

backend_gig.42c25d62-2f07-11e7-9b48-025317f685e8             
backend_kw-subscribe.d8bbfff0-2f09-11e7-9b48-025317f685e8
backend_gig.5fb8ab00-2f01-11e7-9b48-025317f685e8             
backend_kw-subscribe.d9d9c645-2f01-11e7-9b48-025317f685e8
backend_gigya.7218ec06-2f04-11e7-9b48-025317f685e8           
backend_kw-subscribe.f7c1bb09-2f05-11e7-9b48-025317f685e8
backend_gigya.97960c51-2f03-11e7-9b48-025317f685e8           
backend_kw-subscribe.fc36ac17-2f06-11e7-9b48-025317f685e8
backend_gig.9e4a9ab7-2f09-11e7-9b48-025317f685e8             
backend_charging-mock.3fcf883a-2e56-11e7-8876-025317f685e8
backend_gig.ac4c9a67-2f06-11e7-9b48-025317f685e8

如何删除不是 running/failed/older 作业的作业目录？那会被mesos/marathon控制吗？我应该设置一个 cron 或一些脚本来删除目录。请提出建议，因为目录占用了很多磁盘 space 并且从服务器出现故障并且无法启动任何任务

Answer 1

Mesos 有自己的系统来处理旧的沙箱清理。

来自documentation：

Sandbox files are scheduled for garbage collection when:

An executor is removed or terminated.

A framework is removed.

An executor is recovered unsuccessfully during agent recovery.

NOTE: During agent recovery, all of the executor’s runs, except for the latest run, are scheduled for garbage collection as well.

Garbage collection is scheduled based on the --gc_delay agent flag. By default, this is one week since the sandbox was last modified. After the delay, the files are deleted.

--gc_disk_headroom=VALUE adjust disk headroom used to calculate maximum executor directory age. Age is calculated by: gc_delay * max(0.0, (1.0 - gc_disk_headroom - disk usage)) every --disk_watch_interval duration. gc_disk_headroom must be a value between 0.0 and 1.0 (default: 0.1)

如何管理旧沙箱（并在作业不再 运行 后清理它们）？

How to manage old sandboxes (and clean them up after jobs are no longer running)?

marathon

mesos

如何管理旧沙箱（并在作业不再运行后清理它们）？