游牧作业部署问题（raw_exec 模式，v1.0.1）

Question

最近从 nomad v.0.9.6 更新到 nomad v.1.01 中断了作业部署。不幸的是，我无法从游牧代理那里获得任何关于“待处理或已死亡”状态的有用信息。我还从 web-ui 检查了跟踪监视器，但没有成功。

能否就如何从代理处获得 reject/pending 原因提供一些建议？

我使用“raw_exec”驱动程序（非特权用户，driver.raw_exec.enable”=“1”） F 或者部署我使用nomad-sdk（版本0.11.3.0）

您可以在此处找到职位定义（从游牧民的角度来看）：

OS 详情：

cat /etc/redhat-release 
CentOS Linux release 7.4.1708 (Core) 
Linux blade1.lab.bulb.hr 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Nomad 代理详细信息：

[root@blade1 ~]# nomad node-status
ID        DC   Name                Class   Drain  Eligibility  Status
5838e8b0  dc1  blade1.lab.bulb.hr  <none>  false  eligible     ready

详细输出：

[root@blade1 ~]# nomad node-status -verbose
ID                                    DC   Name                Class   Address         Version  Drain  Eligibility  Status
5838e8b0-ebd3-5c47-a949-df3d601e0da1  dc1  blade1.lab.bulb.hr  <none>  192.168.112.31  1.0.1    false  eligible     ready
[root@blade1 ~]# nomad node-status -verbose 5838e8b0-ebd3-5c47-a949-df3d601e0da1
ID              = 5838e8b0-ebd3-5c47-a949-df3d601e0da1
Name            = blade1.lab.bulb.hr
Class           = <none>
DC              = dc1
Drain           = false
Eligibility     = eligible
Status          = ready
CSI Controllers = <none>
CSI Drivers     = <none>
Uptime          = 1516h1m31s

Drivers
Driver    Detected  Healthy  Message                             Time
docker    false     false    Failed to connect to docker daemon  2020-12-18T14:37:09+01:00
exec      false     false    Driver must run as root             2020-12-18T14:37:09+01:00
java      false     false    Driver must run as root             2020-12-18T14:37:09+01:00
qemu      false     false    <none>                              2020-12-18T14:37:09+01:00
raw_exec  true      true     Healthy                             2020-12-18T14:37:09+01:00

Node Events
Time                       Subsystem  Message          Details
2020-12-18T14:37:09+01:00  Cluster    Node registered  <none>

Allocated Resources
CPU          Memory      Disk
0/18000 MHz  0 B/53 GiB  0 B/70 GiB

Allocation Resource Utilization
CPU          Memory
0/18000 MHz  0 B/53 GiB

Host Resource Utilization
CPU            Memory         Disk
499/20000 MHz  33 GiB/63 GiB  (/dev/mapper/vg00-root)

Allocations
No allocations placed

Attributes
consul.datacenter         = dacs
consul.revision           = 1e03567d3
consul.server             = true
consul.version            = 1.8.5
cpu.arch                  = amd64
driver.raw_exec           = 1
kernel.name               = linux
kernel.version            = 3.10.0-693.21.1.el7.x86_64
memory.totalbytes         = 67374776320
nomad.advertise.address   = 192.168.112.31:5656
nomad.revision            = c9c68aa55a7275f22d2338f2df53e67ebfcb9238
nomad.version             = 1.0.1
os.name                   = centos
os.signals                = SIGTTIN,SIGUSR2,SIGXCPU,SIGBUS,SIGILL,SIGQUIT,SIGCHLD,SIGIOT,SIGKILL,SIGINT,SIGSTOP,SIGSYS,SIGTTOU,SIGFPE,SIGSEGV,SIGTSTP,SIGURG,SIGWINCH,SIGCONT,SIGIO,SIGTRAP,SIGXFSZ,SIGHUP,SIGPIPE,SIGTERM,SIGPROF,SIGABRT,SIGALRM,SIGUSR1
os.version                = 7.4.1708
unique.cgroup.mountpoint  = /sys/fs/cgroup/systemd
unique.consul.name        = grabber1
unique.hostname           = blade1.lab.bulb.hr
unique.network.ip-address = 192.168.112.31
unique.storage.bytesfree  = 74604830720
unique.storage.bytestotal = 126698909696
unique.storage.volume     = /dev/mapper/vg00-root

Meta
connect.gateway_image     = envoyproxy/envoy:v${NOMAD_envoy_version}
connect.log_level         = info
connect.proxy_concurrency = 1
connect.sidecar_image     = envoyproxy/envoy:v${NOMAD_envoy_version}

工作状态详情

[root@blade1 ~]# nomad status
ID                                     Type     Priority  Status   Submit Date
lightningCollector-lightningCollector  service  50        pending  2020-12-18T15:06:09+01:00


[root@blade1 ~]# nomad status lightningCollector-lightningCollector
ID            = lightningCollector-lightningCollector
Name          = lightningCollector-lightningCollector
Submit Date   = 2020-12-18T15:06:09+01:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = pending
Periodic      = false
Parameterized = false

Summary
Task Group                               Queued  Starting  Running  Failed  Complete  Lost
lightningCollector-lightningCollector-0  0       0         0        0       0         0

Allocations
No allocations placed

感谢您付出的努力和时间！问候，伊万

Answer 1

我在本地测试了你的工作并且能够重现你的体验。我注意到在作业中设置了 ParentID，Nomad 使用它来跟踪定期或调度作业的子实例。

将 ParentID 值设置为 "" 后，我能够提交作业并且它已正确评估和安排。

我对这些版本进行了一些测试并确定了 0.12.0 和 0.12.1 中的行为发生了变化。我提交了 hashicorp/nomad #10422 以回应这种行为差异。

游牧作业部署问题（raw_exec 模式，v1.0.1）

Problem with nomad job deployment (raw_exec mode, v1.0.1)

nomad