Linux 有时找不到可执行文件

Linux Executables Sometimes Not Found

问题: 如何解决批处理脚本期间有时会消失的可执行文件?

背景: 我有一系列脚本,旨在使用 mbgrid 和 gmt surface 等工具解析 xyz 高程数据、划分为网格、合并和插值这些数据。这些可以手动调用,但通常 运行 作为使用配置文件指定网格大小、单元格分辨率和目录结构等内容的计划任务。

这些脚本是用 python 编写的,使用多处理并通过子进程进行系统调用。

问题: 有时,我会遇到找不到命令的情况 (例如猫 180E48S_mbgrid_output.log /bin/sh:1:mbgrid:未找到)

由于这些脚本是通过带有配置文件的 cron 调用的,因此我希望得到一致的结果。大多数时候(而且似乎总是在我观看时,即使我只是 运行 脚本 cron 启动),代码 运行 成功完成。但有时不是。

下面是一个日志文件片段,显示未找到 mbgrid(粗体),后跟一个成功 运行。然后在另一个成功 运行.

之前找不到 mbgrid 的另一个实例

grep '/bin/sh: 1: mbgrid: not found' 22E59N_mbgrid_output.log -B 10 -A 10

/bin/sh: 1: mbgrid: not found

Zgrid starting iterations Zgrid iteration 10 convergence test: -0.001000 last:0.000000 Zgrid iteration 20 convergence test: -0.001000 last:-0.001000 Zgrid iteration 30 convergence test: -0.001000 last:-0.001000 Zgrid iteration 40 convergence test: -0.001000 last:-0.001000 Zgrid iteration 50 convergence test: -0.001000 last:-0.001000 Zgrid iteration 60 convergence test: -0.001000 last:-0.001000 Zgrid iteration 70 convergence test: -0.001000 last:-0.001000 Zgrid iteration 80 convergence test: -0.001000 last:-0.001000 Zgrid iteration 90 convergence test: -0.001000 last:-0.001000

...截取日志...

mbm_grdplot -I22E59Nweightedgrid.grd -G1 -C -V -L"File 22E59Nweightedgrid.grd - Topography Grid:Topography (m)"

executing mbm_grdplot... mbm_grdplot -I22E59Nweightedgrid_num.grd -G1 -W1/2 -V -L"File 22E59Nweightedgrid_num.grd - Topography Grid:Number of Topography Data Points"

executing mbm_grdplot... mbm_grdplot -I22E59Nweightedgrid_sd.grd -G1 -W1/2 -V -L"File 22E59Nweightedgrid_sd.grd - Topography Grid:Topography Standard Deviation (m)"

Done.

/bin/sh: 1: mbgrid: not found

Zgrid starting iterations Zgrid iteration 10 convergence test: -0.001000 last:0.000000 Zgrid iteration 20 convergence test: -0.001000 last:-0.001000 Zgrid iteration 30 convergence test: -0.001000 last:-0.001000 Zgrid iteration 40 convergence test: -0.001000 last:-0.001000 Zgrid iteration 50 convergence test: -0.001000 last:-0.001000 Zgrid iteration 60 convergence test: -0.001000 last:-0.001000 Zgrid iteration 70 convergence test: -0.001000 last:-0.001000 Zgrid iteration 80 convergence test: -0.001000 last:-0.001000 Zgrid iteration 90 convergence test: -0.001000 last:-0.001000

...截取日志...

采取的步骤: 当我第一次遇到这个时,我将它与一些特别大且密集的网格联系起来。我很快看到我有 10 个进程,每个进程都需要 20% 的内存。调低进程数似乎可以解决这个问题。

但是,我已经看到这种情况发生,即使它看起来不像是内存压力问题,但我的系统监控中可能遗漏了一些东西。

我设置了 sar 监控,每隔几分钟抓取一次系统统计信息。我还设置了一个 cron,每 5 分钟获取一些最重要的统计数据。我真的没有在这些文件中看到任何会导致 linux 资源匮乏以至于无法找到可执行文件的内容。我会把它们放在底部,因为这已经很长了。

我已经尝试添加一些日志记录诊断来发现问题。

这是检查 mbgrid 是否存在的代码片段,如果找到,则记录正在执行的确切命令 运行:

 if (os.path.exists(r'/usr/local/bin/mbgrid') and os.path.isfile(r'/usr/local/bin/mbgrid')):
        try:
            logging.info("mbgrid -I"+gridcellname+"mbdatalist.mbf -O"+ \
                            weightedname+" -R"+str(wbound)+"/"+str(ebound)+"/"+str(sbound)+"/"+str(nbound)+ \
                            " -A2 -P1 -V -F1 -C1 -N -M -E"+str(cellsize)+"/0.0/degrees! >> " + gridcellname + \
                            "_mbgrid_output.log 2>&1")
            subprocess.call("mbgrid -I"+gridcellname+"mbdatalist.mbf -O"+ \
                            weightedname+" -R"+str(wbound)+"/"+str(ebound)+"/"+str(sbound)+"/"+str(nbound)+ \
                            " -A2 -P1 -V -F1 -C1 -N -M -E"+str(cellsize)+"/0.0/degrees! >> " + gridcellname + \
                            "_mbgrid_output.log 2>&1",cwd=griddir,shell=True)
            logging.info("mbgrid complete")
        except:
            logging.error("Error creating weighted grid for "+gridcellname)
            sys.exit("Error creating weighted grid for "+gridcellname)
    else:
        logging.error("Unable to find the mbgrid executable at /usr/local/bin/mbgrid for " + gridcellname)

来自给定配置文件的主日志: otto@Tendua:~/automated/NZ_TopoBathy_30m$ grep '180E48S_mbgrid' NZ_TopoBathy_30m_grids.log -A 3 -B 3

INFO:09/08/2017 21:35:06:clipping and reformatting grid for 162E50S

INFO:09/08/2017 21:35:06:simplifying 165E34S_ngdc.xyz by converting to grid and back to xyz

INFO:09/08/2017 21:35:06:run mbgrid using the datalist to create weighted grid

INFO:09/08/2017 21:35:06:mbgrid -I180E48Smbdatalist.mbf -O180E48Sweightedgrid -R179.7/181.3/-48.3/-46.7 -A2 -P1 -V -F1 -C1 -N -M -E0.000277777777778/0.0/degrees! >> 180E48S_mbgrid_output.log 2>&1

INFO:09/08/2017 21:35:06:grid converted to netcdf3 format

INFO:09/08/2017 21:35:06:run mbgrid using the datalist to create weighted grid

INFO:09/08/2017 21:35:06:mbgrid -I175E30Smbdatalist.mbf -O175E30Sweightedgrid -R174.7/176.3/-30.3/-28.7 -A2 -P1 -V -F1 -C1 -N -M -E0.000277777777778/0.0/degrees! >> 175E30S_mbgrid_output.log 2>&1

并且在该确切网格的日志文件中:

cat 180E48S_mbgrid_output.log

/bin/sh: 1: mbgrid: not found

当我在正确的目录中发出确切的命令时,它 运行 是预期的。

我还应该寻找什么来识别和解决这个问题?

非常感谢您的宝贵时间。

============================================= ========

前十个进程,运行 每 5 分钟一次(它还跟踪数据驱动器上的磁盘 space): otto@Tendua:~$ cat top_ten_process_monitor.sh

#!/bin/sh
date
free -m | awk 'NR==2{printf "Memory Usage: %s/%sMB (%.2f%%)\n", ,,*100/ }'
df -h | awk '[=12=] ~ "tendua" {printf "Disk Usage: %1.1f/%1.1fTB (%s)\n", ,,}'
top -bn1 | grep load | awk '{printf "CPU Load: %.2f\n", $(NF-2)}'
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head

(日志部分来自上述事件的时间)

Fri Sep  8 21:26:01 MDT 2017
Memory Usage: 63865/64239MB (99.42%)
Disk Usage: 2.7/3.6TB (74%)
Disk Usage: 2.5/3.6TB (68%)
Disk Usage: 2.3/3.6TB (64%)
Disk Usage: 4.3/5.5TB (83%)
CPU Load: 1.57
  PID  PPID CMD                         %MEM %CPU
 3973  3886 dbus-daemon --fork --sessio  0.6  0.0
 4085  3886 /usr/lib/unity/unity-panel-  0.6  0.0
 4253  4079 compiz                       0.3  0.4
11350  3886 /opt/google/chrome/chrome    0.3  2.1
18562 11366 /opt/google/chrome/chrome -  0.3  8.8
17402  1109 smbd -F                      0.1  0.0
 3213     1 /opt/google/chrome/chrome -  0.1  0.1
 1587  1450 /usr/bin/X -core :0 -seat s  0.1  0.3
 4019  3886 nautilus --new-window        0.1  0.0
Fri Sep  8 21:31:01 MDT 2017
Memory Usage: 63852/64239MB (99.40%)
Disk Usage: 2.7/3.6TB (75%)
Disk Usage: 2.5/3.6TB (68%)
Disk Usage: 2.3/3.6TB (64%)
Disk Usage: 4.3/5.5TB (83%)
CPU Load: 1.57
  PID  PPID CMD                         %MEM %CPU
 3973  3886 dbus-daemon --fork --sessio  0.6  0.0
 4085  3886 /usr/lib/unity/unity-panel-  0.6  0.0
 4253  4079 compiz                       0.3  0.4
11350  3886 /opt/google/chrome/chrome    0.3  2.1
18562 11366 /opt/google/chrome/chrome -  0.3  8.8
17402  1109 smbd -F                      0.1  0.0
 3213     1 /opt/google/chrome/chrome -  0.1  0.1
 1587  1450 /usr/bin/X -core :0 -seat s  0.1  0.3
 4019  3886 nautilus --new-window        0.1  0.0
Fri Sep  8 21:36:01 MDT 2017
Memory Usage: 63796/64239MB (99.31%)
Disk Usage: 2.7/3.6TB (75%)
Disk Usage: 2.5/3.6TB (68%)
Disk Usage: 2.3/3.6TB (64%)
Disk Usage: 4.3/5.5TB (83%)
CPU Load: 4.59
  PID  PPID CMD                         %MEM %CPU
 3973  3886 dbus-daemon --fork --sessio  0.6  0.0
 4085  3886 /usr/lib/unity/unity-panel-  0.6  0.0
 4253  4079 compiz                       0.3  0.4
11350  3886 /opt/google/chrome/chrome    0.3  2.1
18562 11366 /opt/google/chrome/chrome -  0.3  8.8
17402  1109 smbd -F                      0.1  0.0
 3213     1 /opt/google/chrome/chrome -  0.1  0.1
 1587  1450 /usr/bin/X -core :0 -seat s  0.1  0.3
 1873  1871 gdal_translate -ot Float32   0.1  0.1
Fri Sep  8 21:41:01 MDT 2017
Memory Usage: 63878/64239MB (99.44%)
Disk Usage: 2.7/3.6TB (75%)
Disk Usage: 2.5/3.6TB (68%)
Disk Usage: 2.3/3.6TB (64%)
Disk Usage: 4.3/5.5TB (83%)
CPU Load: 4.32
  PID  PPID CMD                         %MEM %CPU
 3973  3886 dbus-daemon --fork --sessio  0.6  0.0
 4085  3886 /usr/lib/unity/unity-panel-  0.6  0.0
 4253  4079 compiz                       0.3  0.4
11350  3886 /opt/google/chrome/chrome    0.3  2.1
18562 11366 /opt/google/chrome/chrome -  0.3  8.8
 5636  5634 /usr/bin/python /usr/bin/gd  0.1 25.6
17402  1109 smbd -F                      0.1  0.0
 3213     1 /opt/google/chrome/chrome -  0.1  0.1
 1587  1450 /usr/bin/X -core :0 -seat s  0.1  0.3

系统统计如下:

sar -f /var/log/sysstat/sa08|head -4 && sar -f /var/log/sysstat/sa08|grep '09:35:01 PM' -A 10 -B 10

Linux 3.19.0-80-generic (Tendua)        09/08/2017      _x86_64_        (12 CPU)

12:00:02 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
12:05:03 AM     all     51.97      0.00      1.32      0.89      0.00     45.82
07:55:01 PM     all      9.44      0.00      0.59      3.51      0.00     86.45
08:05:01 PM     all      1.50      0.00      1.12      9.39      0.00     87.99
08:15:01 PM     all      1.48      0.00      1.12      9.16      0.00     88.24
08:25:01 PM     all      1.46      0.00      1.13      9.29      0.00     88.12
08:35:01 PM     all      1.50      0.00      1.24      8.65      0.00     88.61
08:45:01 PM     all      1.52      0.00      1.57      7.87      0.00     89.04
08:55:01 PM     all      1.55      0.00      1.64      7.96      0.00     88.85
09:05:01 PM     all      1.53      0.00      1.70      8.08      0.00     88.69
09:15:01 PM     all      1.54      0.00      1.74      8.04      0.00     88.68
09:25:01 PM     all      1.50      0.00      1.69      8.16      0.00     88.65
09:35:01 PM     all      1.55      0.00      1.69      7.98      0.00     88.78
09:45:01 PM     all      2.17      0.00      1.96     11.67      0.00     84.20
09:55:01 PM     all      1.71      0.00      1.87     18.06      0.00     78.36
10:05:01 PM     all      1.70      0.00      2.13     17.57      0.00     78.60
10:15:01 PM     all      1.73      0.00      1.93     18.64      0.00     77.70
10:25:01 PM     all      1.75      0.00      1.96     18.36      0.00     77.93
10:35:01 PM     all      3.91      0.00      1.80     15.40      0.00     78.89
10:45:01 PM     all      1.66      0.00      1.90     19.23      0.00     77.20
10:55:01 PM     all      1.68      0.00      1.82     19.05      0.00     77.45
11:05:01 PM     all      1.69      0.00      1.77     19.31      0.00     77.23
11:15:01 PM     all      7.69      0.00      1.03      8.68      0.00     82.60

sar -r -f /var/log/sysstat/sa08|head -4 && sar -r -f /var/log/sysstat/sa08|grep '09:35:01 PM' -A 10 -B 10

Linux 3.19.0-80-generic (Tendua)        09/08/2017      _x86_64_        (12 CPU)

12:00:02 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
12:05:03 AM   3859704  61921864     94.13     50184  52583736  17699508     25.29  29962700  29837068    611932
07:55:01 PM    392740  65388828     99.40    407920  60884772  10851876     15.51   2703424  60625128   2853088
08:05:01 PM    402104  65379464     99.39    410904  61032736  10845076     15.50   3520252  59959668   1336572
08:15:01 PM    371832  65409736     99.43    413876  61069912  10859576     15.52   3403804  60108248   1220840
08:25:01 PM    397468  65384100     99.40    416592  61053400  10820532     15.46   1277516  62219572   3004488
08:35:01 PM    426124  65355444     99.35    419376  61037072  10808244     15.45   1069828  62416140    306880
08:45:01 PM    453688  65327880     99.31    423508  60966356  10864072     15.53   1070752  62351248    929620
08:55:01 PM    367620  65413948     99.44    427112  61029896  10804544     15.44   1078392  62411352    197616
09:05:01 PM    377924  65403644     99.43    430608  60967232  10794432     15.43   1082116  62345104    234916
09:15:01 PM    415260  65366308     99.37    433856  60842704  10822332     15.47   1099380  62217480   1687536
09:25:01 PM    452492  65329076     99.31    436928  60832668  10842884     15.50   1097624  62204328   1594256
09:35:01 PM    445220  65336348     99.32    460300  60819540  10855512     15.51   1137008  62187660       528
09:45:01 PM    470044  65311524     99.29    463480  60443220  10960588     15.66  24385076  38668700   1765864
09:55:01 PM    472608  65308960     99.28    464708  60422328  11067312     15.82  28572380  34443408   1358136
10:05:01 PM    433176  65348392     99.34    465852  60438212  10957232     15.66  29127204  33924120   4382320
10:15:01 PM    450192  65331376     99.32    467016  60405816  11024924     15.76  30376748  32652588   2063424
10:25:01 PM    376448  65405120     99.43    468388  60579636  10831104     15.48  31660420  31440712       704
10:35:01 PM    560628  65220940     99.15    469856  60329228  10979960     15.69  38914816  23997528   4796836
10:45:01 PM    521660  65259908     99.21    471004  60329408  11008184     15.73  41755472  21194232   3680356
10:55:01 PM    556596  65224972     99.15    472160  60329584  10921484     15.61  44714800  18191336   2351468
11:05:01 PM    543664  65237904     99.17    473288  60329768  11025832     15.76  46713556  16209272   3482536
11:15:01 PM    408040  65373528     99.38    475420  60500764  10935492     15.63  41843776  21182704    168384

sar -b -f /var/log/sysstat/sa08|head -4 && sar -b -f /var/log/sysstat/sa08|grep '09:35:01 PM' -A 10 -B 10

Linux 3.19.0-80-generic (Tendua)        09/08/2017      _x86_64_        (12 CPU)

12:00:02 AM       tps      rtps      wtps   bread/s   bwrtn/s
12:05:03 AM    135.06    125.31      9.75  32088.68  15436.97
07:55:01 PM    214.60    187.68     26.92  47229.75  31671.99
08:05:01 PM    407.57    334.90     72.67  85024.47  91730.47
08:15:01 PM    411.46    341.30     70.16  86763.79  87368.85
08:25:01 PM    420.48    353.86     66.62  90036.21  84149.34
08:35:01 PM    480.24    394.76     85.48 100597.74 109374.29
08:45:01 PM    634.61    530.99    103.62 134987.46 132954.73
08:55:01 PM    683.04    567.68    115.36 144695.57 147487.06
09:05:01 PM    712.35    594.86    117.50 151571.92 151528.64
09:15:01 PM    732.52    613.41    119.10 156291.37 151223.34
09:25:01 PM    716.90    598.66    118.24 152659.05 152950.26
09:35:01 PM    719.66    594.02    125.65 147751.96 153455.47
09:45:01 PM    449.31    379.28     70.03  96426.44 106934.35
09:55:01 PM    249.01     48.08    200.93  12311.36 304792.29
10:05:01 PM    225.01     32.41    192.60   8286.13 310338.90
10:15:01 PM    220.01     15.86    204.15   4051.83 312400.63
10:25:01 PM    210.24      8.80    201.44   1863.78 308831.78
10:35:01 PM    199.95     23.89    176.05   5375.87 249054.33
10:45:01 PM    204.32      0.00    204.32      0.00 322379.98
10:55:01 PM    201.33      0.00    201.33      0.00 314086.77
11:05:01 PM    206.65      0.00    206.64      0.04 317881.94
11:15:01 PM    168.05     32.33    135.72   2041.40 151844.90

sar -w -f /var/log/sysstat/sa08|head -4 && sar -w -f /var/log/sysstat/sa08|grep '09:35:01 PM' -A 10 -B 10

Linux 3.19.0-80-generic (Tendua)        09/08/2017      _x86_64_        (12 CPU)

12:00:02 AM    proc/s   cswch/s
12:05:03 AM      1.45    800.71
07:55:01 PM      0.82   1116.96
08:05:01 PM      0.86   1584.54
08:15:01 PM      0.82   1530.11
08:25:01 PM      0.70   1543.24
08:35:01 PM      0.71   1630.97
08:45:01 PM      0.72   1942.89
08:55:01 PM      0.78   1988.69
09:05:01 PM      0.77   2030.87
09:15:01 PM      1.03   2061.84
09:25:01 PM      0.86   2027.47
09:35:01 PM      0.82   2062.62
09:45:01 PM     19.65   1789.45
09:55:01 PM      0.73    945.56
10:05:01 PM      0.53    901.46
10:15:01 PM      0.68    873.46
10:25:01 PM      0.76    793.85
10:35:01 PM      7.85    950.96
10:45:01 PM      0.74    869.87
10:55:01 PM      0.79    913.40
11:05:01 PM      0.78    894.85
11:15:01 PM      0.80   1225.52

Since these scripts are called via cron with a configuration file, I would expect consistent results. Most of the time (and it seems always when I'm watching, even if I just run the script cron initiates), the code runs successfully to completion. But sometimes not.

看来是环境问题。尝试使用手动 运行 和 cron 作业记录 os.environ。

要查找的变量是PATH

PATH is an environmental variable in Linux and other Unix-like operating systems that tells the shell which directories to search for executable files (i.e., ready-to-run programs) in response to commands issued by a user.

如果我的猜想是正确的,您可以通过手动设置来解决这个问题 os.environ['PATH'] = "path to binaries dir" 然后将 os.environ 添加到子进程调用

cwd 参数