修改 gupdatedb(GNU updatedb 命令)插入并行命令

Modify gupdatedb (GNU updatedb command) to insert parallel command

我在 MacOS 10.15 上使用工具 glocategupdatedb 来自 findutils 安装包 brew.

我想将 shell 命令“parallel”集成到脚本中 gupdatedb为了更快速地建立数据库。

在原始版本的脚本 gupdatedb 命令中,我得到:

: ${find:=${BINDIR}/gfind}

1) 我试图在上面的这个命令中插入 parallel 命令。

通常,对于 gfind,我们可以这样使用 parallel 命令:

parallel --lb -j32 gfind ::: /*

选项'/*'用于查找根目录及其所有子目录中的所有文件。

所以我尝试这样做(对于 gupdatedb 脚本):

: ${find:=/usr/local/bin/parallel -j32 ${BINDIR}/gfind}

但是在执行时,出现了以下错误,我无法解释:

updatedb needs to be able to execute -j32, but cannot.

2) 我也尝试通过变量传递:

    num_threads=-j32
    ${parallel:=${BINDIR}/parallel --lb $num_threads}
    : ${find:=${parallel} ${BINDIR}/gfind \{\} ::: }
    : ${frcode:=${LIBEXECDIR}/gfrcode}

但代码仍处于锁定状态,未生成数据库。

如何克服这个问题才能在多个线程(这里是 8 个线程)上执行 gfind?

PS1 : 在这个 post 中,我参考了另一个 link : 解释如何结合 findparallel 命令。

PS2 : 脚本 gupdatedb 比较长,所以我在下面给出相关部分,至少我认为(我用 CMD+C 停止了挂起的程序) :

# The database file to build.
: ${LOCATE_DB=/usr/local/var/locate/locatedb}

# Directory to hold intermediate files.
if test -z "$TMPDIR"; then
  if test -d /var/tmp; then
    : ${TMPDIR=/var/tmp}
  elif test -d /usr/tmp; then
    : ${TMPDIR=/usr/tmp}
  else
    : ${TMPDIR=/tmp}
  fi
fi
export TMPDIR

# The user to search network directories as.
: ${NETUSER=daemon}

# The directory containing the subprograms.
if test -n "$LIBEXECDIR" ; then
    : LIBEXECDIR already set, do nothing
else
    : ${LIBEXECDIR=/usr/local/Cellar/findutils/4.7.0/libexec}
fi

# The directory containing find.
if test -n "$BINDIR" ; then
    : BINDIR already set, do nothing
else
    : ${BINDIR=/usr/local/bin}
fi

# DEV : parallel prefix command
num_threads=-j32
${parallel:=${BINDIR}/parallel --lb $num_threads}
# The names of the utilities to run to build the database.
: ${find:=${parallel} ${BINDIR}/gfind \{\} ::: }
: ${frcode:=${LIBEXECDIR}/gfrcode}

更新 1: 从我的结果来看,如果我评论行 # checkbinary $binary 并且如果我应用我的第二种方法(见 2),我试过......),我收到以下错误消息(我已激活 set -x 进行调试:

+ version='
updatedb (GNU findutils) 4.7.0
Copyright (C) 1994-2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Eric B. Decker, James Youngman, and Kevin Dalley.
'
+ LC_ALL=C
+ export LC_ALL
+ usage='Usage: /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb [--findoptions='\''-option1 -option2...'\'']
       [--localpaths='\''dir1 dir2...'\''] [--netpaths='\''dir1 dir2...'\'']
       [--prunepaths='\''dir1 dir2...'\''] [--prunefs='\''fs1 fs2...'\'']
       [--output=dbfile] [--netuser=user] [--localuser=user]
       [--dbformat] [--version] [--help]

Please see also the documentation at http://www.gnu.org/software/findutils/.
Report (and track progress on fixing) bugs in the updatedb
program via the GNU findutils bug-reporting page at
https://savannah.gnu.org/bugs/?group=findutils or, if
you have no web access, by sending email to <bug-findutils@gnu.org>.
'
+ changeto=/
+ frcode_options=
+ case "$dbformat" in
+ true
+ sort='/usr/bin/sort -z'
+ print_option=-print0
+ frcode_options=' -0'
+ :
+ : /usr/local/bin/zsh
+ : /
+ :
+ : '
/afs
/amd
/proc
/sfs
/tmp
/usr/tmp
/var/tmp
'
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ for p in '$PRUNEPATHS'
+ case "$p" in
+ test -z ''
++ echo /afs /amd /proc /sfs /tmp /usr/tmp /var/tmp
++ sed -e 's,^,\(^,' -e 's, ,$\)\|\(^,g' -e 's,$,$\),'
+ PRUNEREGEX='\(^/afs$\)\|\(^/amd$\)\|\(^/proc$\)\|\(^/sfs$\)\|\(^/tmp$\)\|\(^/usr/tmp$\)\|\(^/var/tmp$\)'
+ : /usr/local/var/locate/locatedb
+ test -z ''
+ test -d /var/tmp
+ : /var/tmp
+ export TMPDIR
+ : daemon
+ test -n ''
+ : /usr/local/Cellar/findutils/4.7.0/libexec
+ test -n ''
+ : /usr/local/bin
+ num_threads=-j32
+ /usr/local/bin/parallel --lb -j32
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:

  Tange, O. (2020, July 22). GNU Parallel 20200722 ('Privacy Shield').
  Zenodo. https://doi.org/10.5281/zenodo.3956817

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice

To silence this citation notice: run 'parallel --citation' once.

Come on: You have run parallel 15 times. Isn't it about time
you run 'parallel --citation' once to silence the citation notice?

parallel: Warning: Input is read from the terminal. You are either an expert
parallel: Warning: (in which case: YOU ARE AWESOME!) or maybe you forgot
parallel: Warning: ::: or :::: or -a or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.
^C+ : /usr/local/bin/parallel --lb -j32 /usr/local/bin/gfind '{}' :::
+ : /usr/local/Cellar/findutils/4.7.0/libexec/gfrcode
+ : '
9P
NFS
afs
autofs
cifs
coda
devfs
devpts
ftpfs
iso9660
mfs
ncpfs
nfs
nfs4
proc
shfs
smbfs
sysfs
'
+ test -n '
9P
NFS
afs
autofs
cifs
coda
devfs
devpts
ftpfs
iso9660
mfs
ncpfs
nfs
nfs4
proc
shfs
smbfs
sysfs
'
++ echo 9P NFS afs autofs cifs coda devfs devpts ftpfs iso9660 mfs ncpfs nfs nfs4 proc shfs smbfs sysfs
++ sed -e 's/\([^ ][^ ]*\)/-o -fstype /g' -e 's/-o //' -e 's/$/ -o/'
+ prunefs_exp='-fstype 9P -o -fstype NFS -o -fstype afs -o -fstype autofs -o -fstype cifs -o -fstype coda -o -fstype devfs -o -fstype devpts -o -fstype ftpfs -o -fstype iso9660 -o -fstype mfs -o -fstype ncpfs -o -fstype nfs -o -fstype nfs4 -o -fstype proc -o -fstype shfs -o -fstype smbfs -o -fstype sysfs -o'
+ rm -f /usr/local/var/locate/locatedb.n
+ trap 'rm -f $LOCATE_DB.n; exit' HUP TERM
+ cd /
+ test -n /
+ '[' '' '!=' '' ']'
+ /usr/bin/sort -z
+ /usr/local/Cellar/findutils/4.7.0/libexec/gfrcode -0
+ : OK so far
+ true
+ test -s /usr/local/var/locate/locatedb.n
+ chmod 644 /usr/local/var/locate/locatedb.n
+ mv /usr/local/var/locate/locatedb.n /usr/local/var/locate/locatedb
+ exit 0

更新 2:

@MarkStechell。我只是在目录中做一个sudo gupdatedb

能否请您提供完整的申请命令:您建议我 parallel -j 32 --lb gfind {} $FINDOPTIONS ... ::: BUNCH_OF_PATHS 但这似乎行不通。

我试过的是:parallel -j32 --lb find {} $FINDOPTIONS * ::: */* 但过了一会儿,我收到以下错误:gfind: failed to read file names from file system at or below '/': No such file or directory :

我想索引主根 / 中的所有文件,但是 //System/Volume/Data/ 是重复的。

更新 3: 如果子目录的数量低于我使用 parallel -j32 ... 启动时使用的线程数量,有没有办法指示parallel 命令探索所有 sub-sub etc sub-sub etc 目录 ?

似乎 make -j32 有这种行为(也许我错了)但是这很有趣,子目录上没有只有一个进程,而这个子目录可能包含很多子-子目录探索,然后从 parallel -j32 ... 启动的所有 32 个进程中受益。然后,这将避免浪费时间不并行化所有这些子目录甚至更深的目录。

更新 4: 我不知道在 @MarkSetchell 建议的命令中该做什么;例如,如果我在当前目录中有 3 个子目录:

# : A2
parallel -j 32 --lb  gfind {} $FINDOPTIONS ... ::: BUNCH_OF_PATHS

特别是 BUNCH_OF_PATHS 放什么?

我必须为此选择 --localpaths dir1/ dir2/ dir3/ 而不是 BUNCH_OF_PATHS ?那么带 3 个点的 $FINDOPTIONS ... 呢?

更新答案

问题出在文件 /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb 中包含 A2 的行之后的行。目前,它的形式是:

# : A2
$find $SEARCHPATHS $FINDOPTIONS \( $prunefs_exp  -type d -regex "$PRUNEREGEX" \) -prune -o $print_option

而您希望它的形式为:

# : A2
parallel -j 32 --lb  gfind {} $FINDOPTIONS ... ::: BUNCH_OF_PATHS

由于您没有给出您希望并行搜索的路径,目前的路径只是 / 这意味着无法并行执行任何操作。您需要将 运行 和 --localpaths 设置为一堆值得并行搜索或更广泛地破解脚本的地方。虽然,老实说,我不确定你为什么要加快速度,因为它应该 运行 相对很少,然后只在系统安静的时候出现。

原答案

转到文件 /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb 的大约第 250 行并用哈希符号将其注释掉,使其看起来像这样:

for binary in $find $frcode
do
  #checkbinary $binary
done