Mesos slaves 拒绝所有具有持久卷的 Marathon 作业;声明没有 space 可用

Mesos slaves reject all Marathon jobs with persistent volumes; claims no space available

我正在尝试使用 Mesos 的持久卷支持,但我很难让它正常工作。

我已经配置了我的每个从属设备,如下所示,并确认它们已经使用这个新配置成功重启:

/etc/mesos-slave/resources

[    ​
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk1" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk2" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk3" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk4" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk5" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "MOUNT",
        "mount" : { "root" : "/mnt/disk6" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "MOUNT",
        "mount" : { "root" : "/mnt/disk7" }
      }
    }
  }
]

具体说明我有未预留的资源。具体来说(完整回复 here):

{
  ...
  "slaves": [{
    "id": "c5e59876-5157-463f-b31e-16b34d6ffc72-S8",
    "pid": "slave(1)@172.30.31.55:5051",
    "hostname": "redacted47.redacted.com",
    "registered_time": 1458810586.61153,
    "resources": {
      "cpus": 32,
      "disk": 29360128,
      "mem": 256651,
      "ports": "[31000-32000]"
    },
    "used_resources": {
      "cpus": 1,
      "disk": 0,
      "mem": 128,
      "ports": "[31282-31282]"
    },
    "offered_resources": {
      "cpus": 0,
      "disk": 0,
      "mem": 0
    },
    "reserved_resources": {},
    "unreserved_resources": {
      "cpus": 32,
      "disk": 29360128,
      "mem": 256651,
      "ports": "[31000-32000]"
    },

每当我尝试向它提交请求持久卷的作业时,所有从属都拒绝它,声称没有可用的磁盘资源:

Mar 26 17:59:43 redacted47.redacted.com start[30457]: [2016-03-26 17:59:43,606] INFO Offer [2220b6bf-aac2-402b-82e6-8d625284d1a4-O9375]. Considering unreserved resources with roles {*}. Not all basic resources satisfied: cpus SATISFIED (1.0 <= 1.0), mem SATISFIED (128.0 <= 128.0), disk including volumes NOT SATISFIED (1024.0 > 0.0) (mesosphere.mesos.ResourceMatcher$:marathon-akka.actor.default-dispatcher-38)
Mar 26 17:59:43 redacted47.redacted.com start[30457]: [2016-03-26 17:59:43,606] INFO Offer [2220b6bf-aac2-402b-82e6-8d625284d1a4-O9376]. Considering unreserved resources with roles {*}. Not all basic resources satisfied: cpus SATISFIED (1.0 <= 1.0), mem SATISFIED (128.0 <= 128.0), disk including volumes NOT SATISFIED (1024.0 > 0.0) (mesosphere.mesos.ResourceMatcher$:marathon-akka.actor.default-dispatcher-38)
Mar 26 17:59:43 redacted47.redacted.com start[30457]: [2016-03-26 17:59:43,606] INFO Finished processing 2220b6bf-aac2-402b-82e6-8d625284d1a4-O9375. Matched 0 ops after 1 passes. disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; cpus(*) 28.0; mem(*) 226955.0; ports(*) 31000->31085,31087->31364,31366->31940,31942->32000 left. (mesosphere.marathon.core.matcher.manager.impl.OfferMatcherManagerActor:marathon-akka.actor.default-dispatcher-11)
Mar 26 17:59:43 redacted47.redacted.com start[30457]: [2016-03-26 17:59:43,606] INFO Offer [2220b6bf-aac2-402b-82e6-8d625284d1a4-O9379]. Considering unreserved resources with roles {*}. Not all basic resources satisfied: cpus SATISFIED (1.0 <= 1.0), mem SATISFIED (128.0 <= 128.0), disk including volumes NOT SATISFIED (1024.0 > 0.0) (mesosphere.mesos.ResourceMatcher$:marathon-akka.actor.default-dispatcher-38)

如果我尝试 post 请求直接针对 mesos master 创建卷,然后它拒绝请求,说 "Insufficient disk resources",如下:

# curl -v -i \
    -u "marathon:$(cat /etc/marathon/.secret)" \
    -d slaveId=c5e59876-5157-463f-b31e-16b34d6ffc72-S8 \
    -d volumes='[
      {
        "name": "disk",
        "type": "SCALAR",
        "scalar": { "value": 512 },
        "role": "foo",
        "reservation": {
          "principal": "marathon"
        },
        "disk": {
          "persistence": {
            "id" : "very-persist"
          },
          "volume": {
            "mode": "RW",
            "container_path": "such-path"
          }
        }
      }
    ]' \
    -X POST http://localhost:5050/master/create-volumes; echo
* About to connect() to localhost port 5050 (#0)
*   Trying ::1...
* Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 5050 (#0)
* Server auth using Basic with user 'marathon'
> POST /master/create-volumes HTTP/1.1
> Authorization: Basic redacted
> User-Agent: curl/7.29.0
> Host: localhost:5050
> Accept: */*
> Content-Length: 481
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 481 out of 481 bytes
< HTTP/1.1 409 Conflict
HTTP/1.1 409 Conflict
< Date: Thu, 24 Mar 2016 09:50:36 GMT
Date: Thu, 24 Mar 2016 09:50:36 GMT
< Content-Length: 53
Content-Length: 53
​
<
* Connection #0 to host localhost left intact
Invalid CREATE Operation: Insufficient disk resources

我已经无计可施了。我不知道我在做什么,我正在尽力遵循文档。任何关于我可能做错了什么的提示将不胜感激。

我是 运行:

我正在尽我所能遵循以下资源的说明:

感谢阅读!

首先感谢您提供如此详细的问题记录!

你的问题似乎是这样的:

a) 没有 root disk resource available. Once you specify a disk resource manually as you did Mesos will stop detecting the root disk automatically. You could simply add a root disk resource as described here 可以解决您的问题。

b) 您上面的 "Create Volume" http 请求将只考虑根磁盘资源(由于上述原因您没有)。 如果要使用非根磁盘,则应将源字段视为非常简单提到的here

顺便说一句,欢迎任何有关如何改进文档的反馈(我将添加关于此问题的简短说明,但用户的任何反馈都非常有帮助)!欢迎在这里贡献!

希望对您有所帮助!

抱歉,我无法添加评论。

我发现文档有点令人生畏。它很详细,而且很多,但我想在自己的时间里学习 mesos、marathon 等,没有例子对我来说真的很难。我更喜欢的是显示一个小集群的页面,其中包含 IP 地址、磁盘、CPU 和设置主机、代理和 zookeeper 集合所需的配置文件。一些示例 json 文件展示了如何在特定用例中使用马拉松。

我的目标是在我的 public github 帐户中为自己做一些笔记,显示我的测试集群并解释当我让持久卷工作时如何配置一切,jenkins 和private docker registry 都在 mesos 中,但我离那个还很远。