将 GNU 并行命令与 gfind 结合使用以获得 gupdatedb 工具的运行时间
Using GNU parallel command with gfind to gain in runtime for gupdatedb tool
我跟前面的一样post
我想构建 gupdatedb 数据库,包含来自主根 /
的所有内容,但下面列出的 PRUNEPATHS
除外。我正在使用 MacOS 10.15 Catalina。
所以,我尝试修改 MacOS 10.15 上的 gupdatedb 脚本以从这样的 parallel
命令中受益(注意 # : A2
部分):
# : A2
cat | parallel -j32 $find {} $SEARCHPATHS $FINDOPTIONS \
\( $prunefs_exp -type d -regex "$PRUNEREGEX" \) \
-prune -o $print_option * :::
如果我不使用 cat |
,我会收到以下警告消息:
parallel: Warning: Input is read from the terminal. You are either an expert
parallel: Warning: (in which case: YOU ARE AWESOME!) or maybe you forgot
parallel: Warning: ::: or :::: or -a or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.
并且进程似乎挂起。
不幸的是,$find = gfind
的多个线程似乎无法同时 运行 :
我已经启动了这样的脚本:sudo time gupdatedb
及以下结果:ps aux | grep find
:
root 84865 0.0 0.0 4459044 15828 s002 S+ 1:43PM 0:00.10 perl /usr/local/bin/parallel -j32 /usr/local/Cellar/findutils/4.7.0/bin/gfind {} / ( -fstype 9P -o -fstype NFS -o -fstype afs -o -fstype autofs -o -fstype cifs -o -fstype coda -o -fstype devfs -o -fstype devpts -o -fstype ftpfs -o -fstype iso9660 -o -fstype mfs -o -fstype ncpfs -o -fstype nfs -o -fstype nfs4 -o -fstype proc -o -fstype shfs -o -fstype smbfs -o -fstype sysfs -o -type d -regex \(^/afs$\)\|\(^/amd$\)\|\(^/proc$\)\|\(^/sfs$\)\|\(^/tmp$\)\|\(^/usr/tmp$\)\|\(^/var/tmp$\)\|\(^/Volumes$\) ) -prune -o -print0 Applications Library System Users Volumes bin cores dev etc home opt private sbin tmp usr var :::
root 84863 0.0 0.0 4268280 796 s002 S+ 1:43PM 0:00.00 /usr/local/Cellar/findutils/4.7.0/libexec/gfrcode -0
root 84861 0.0 0.0 4282172 708 s002 S+ 1:43PM 0:00.00 /bin/sh /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
root 84853 0.0 0.0 4273980 1164 s002 S+ 1:43PM 0:00.01 /bin/sh /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
root 84850 0.0 0.0 5396228 10288 s008 S+ 1:43PM 0:00.27 vim /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
root 84849 0.0 0.0 4788896 6740 s008 S+ 1:43PM 0:00.03 sudo vim /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
最后,数据库可能无法构建,我正在检查 /usr/local/var/locate/locatedb.n
和 /usr/local/var/locate/locatedb
的大小,但没有任何变化。
我在 parallel 中使用的语法有什么问题? (特别是,我不知道如何处理命令的 ... ::: options
部分)
PS : 我设置在 gupdatedb
:
# Directories to not put in the database, which would otherwise be.
: ${PRUNEPATHS="
/afs
/amd
/proc
/sfs
/tmp
/usr/tmp
/var/tmp
/Volumes
"}
和
# You can set these in the environment, or use command-line options,
# to override their defaults:
# Any global options for find?
: ${FINDOPTIONS=}
# What shell shoud we use? We should use a POSIX-ish sh.
: ${SHELL="/bin/sh"}
# Non-network directories to put in the database.
: ${SEARCHPATHS="/"}
更新 1
更准确地说,这里是 post 我要求与 parallel/find
夫妇进行潜在优化(并行化)的地方:
我想做同样的优化,但针对脚本 gupdatedb
。
更新 2
我听从了 :
的建议
进入 gupdatedb
关于我的问题的默认命令是:
$find $SEARCHPATHS $FINDOPTIONS \
\( $prunefs_exp \
-type d -regex "$PRUNEREGEX" \) -prune -o $print_option
所以,我刚刚修改成这样:
parallel -j32 $find {} $SEARCHPATHS $FINDOPTIONS \
\( $prunefs_exp \
-type d -regex "$PRUNEREGEX" \) -prune -o $print_option ::: /
我收到以下错误:
/bin/sh: -c: line 0: syntax error near unexpected token `('
/bin/sh: -c: line 0: `/usr/local/Cellar/findutils/4.7.0/bin/gfind / / ( -fstype 9P -o -fstype NFS -o -fstype afs -o -fstype autofs -o -fstype cifs -o -fstype coda -o -fstype devfs -o -fstype devpts -o -fstype ftpfs -o -fstype iso9660 -o -fstype mfs -o -fstype ncpfs -o -fstype nfs -o -fstype nfs4 -o -fstype proc -o -fstype shfs -o -fstype smbfs -o -fstype sysfs -o -type d -regex \(^/private/tmp$\)\|\(^/private/var/folders$\)\|\(^/private/var/tmp$\)\|\(^*/Backups.backupdb$\)\|\(^/System$\)\|\(^/Volumes$\) ) -prune -o -print0'
这里可能有什么问题?
更新 3
这里是脚本 gupdatedb
,您可以从第 300 行看到我的不同尝试:
#! /bin/sh
# updatedb -- build a locate pathname database
# Copyright (C) 1994-2019 Free Software Foundation, Inc.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
# csh original by James Woods; sh conversion by David MacKenzie.
#exec 2> /tmp/updatedb-trace.txt
#set -x
version='
updatedb (GNU findutils) 4.7.0
Copyright (C) 1994-2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Eric B. Decker, James Youngman, and Kevin Dalley.
'
# File path names are not actually text, anyway (since there is no
# mechanism to enforce any constraint that the basename of a
# subdirectory has the same character encoding as the basename of its
# parent). The practical effect is that, depending on the way a
# particular system is configured and the content of its filesystem,
# passing all the file names in the system through "sort" may generate
# character encoding errors in text-based tools like "sort". To avoid
# this, we set LC_ALL=C. This will, presumably, not work perfectly on
# systems where LC_ALL is not the way to do locale configuration or
# some other seting can override this.
LC_ALL=C
export LC_ALL
# We can't use substitution on PACKAGE_URL below because it
# (correctly) points to https://www.gnu.org/software/findutils/ instead
# of the bug reporting page.
usage="\
Usage: [=18=] [--findoptions='-option1 -option2...']
[--localpaths='dir1 dir2...'] [--netpaths='dir1 dir2...']
[--prunepaths='dir1 dir2...'] [--prunefs='fs1 fs2...']
[--output=dbfile] [--netuser=user] [--localuser=user]
[--dbformat] [--version] [--help]
Please see also the documentation at http://www.gnu.org/software/findutils/.
Report (and track progress on fixing) bugs in the updatedb
program via the GNU findutils bug-reporting page at
https://savannah.gnu.org/bugs/?group=findutils or, if
you have no web access, by sending email to <bug-findutils@gnu.org>.
"
changeto=/
for arg
do
# If we are unable to fork, the back-tick operator will
# fail (and the shell will emit an error message). When
# this happens, we exit with error value 71 (EX_OSERR).
# Alternative candidate - 75, EX_TEMPFAIL.
opt=`echo $arg|sed 's/^\([^=]*\).*//'` || exit 71
val=`echo $arg|sed 's/^[^=]*=\(.*\)//'` || exit 71
case "$opt" in
--findoptions) FINDOPTIONS="$val" ;;
--localpaths) SEARCHPATHS="$val" ;;
--netpaths) NETPATHS="$val" ;;
--prunepaths) PRUNEPATHS="$val" ;;
--prunefs) PRUNEFS="$val" ;;
--output) LOCATE_DB="$val" ;;
--netuser) NETUSER="$val" ;;
--localuser) LOCALUSER="$val" ;;
--changecwd) changeto="$val" ;;
--dbformat) dbformat="$val" ;;
--version) fail=0; echo "$version" || fail=1; exit $fail ;;
--help) fail=0; echo "$usage" || fail=1; exit $fail ;;
*) echo "updatedb: invalid option $opt
Try '[=18=] --help' for more information." >&2
exit 1 ;;
esac
done
frcode_options=""
case "$dbformat" in
"")
# Default, use LOCATE02
;;
LOCATE02)
;;
slocate)
frcode_options="$frcode_options -S 1"
;;
*)
# The "old" database format is no longer supported.
echo "Unsupported locate database format ${dbformat}: Supported formats are:" >&2
echo "LOCATE02, slocate" >&2
exit 1
esac
if true
then
sort="/usr/bin/sort -z"
print_option="-print0"
frcode_options="$frcode_options -0"
else
sort="/usr/bin/sort"
print_option="-print"
fi
getuid() {
# format of "id" output is ...
# uid=1(daemon) gid=1(other)
# for `id's that don't understand -u
id | cut -d'(' -f 1 | cut -d'=' -f2
}
# figure out if su supports the -s option
select_shell() {
if su "" -s $SHELL -c false < /dev/null ; then
# No.
echo ""
else
if su "" -s $SHELL -c true < /dev/null ; then
# Yes.
echo "-s $SHELL"
else
# su is unconditionally failing. We won't be able to
# figure out what is wrong, so be conservative.
echo ""
fi
fi
}
# You can set these in the environment, or use command-line options,
# to override their defaults:
# Any global options for find?
: ${FINDOPTIONS="-mindepth 1 -maxdepth 1"}
#: ${FINDOPTIONS=""}
# What shell shoud we use? We should use a POSIX-ish sh.
: ${SHELL="/bin/sh"}
# Non-network directories to put in the database.
: ${SEARCHPATHS="/"}
# Network (NFS, AFS, RFS, etc.) directories to put in the database.
: ${NETPATHS=}
# Directories to not put in the database, which would otherwise be.
: ${PRUNEPATHS="
/afs
/amd
/proc
/sfs
/tmp
/usr/tmp
/var/tmp
"}
# Trailing slashes result in regex items that are never matched, which
# is not what the user will expect. Therefore we now reject such
# constructs.
for p in $PRUNEPATHS; do
case "$p" in
/*/) echo "[=18=]: $p: pruned paths should not contain trailing slashes" >&2
exit 1
esac
done
# The same, in the form of a regex that find can use.
test -z "$PRUNEREGEX" &&
PRUNEREGEX=`echo $PRUNEPATHS|sed -e 's,^,\\(^,' -e 's, ,$\\)\\|\\(^,g' -e 's,$,$\\),'`
# The database file to build.
: ${LOCATE_DB=/usr/local/var/locate/locatedb}
# Directory to hold intermediate files.
if test -z "$TMPDIR"; then
if test -d /var/tmp; then
: ${TMPDIR=/var/tmp}
elif test -d /usr/tmp; then
: ${TMPDIR=/usr/tmp}
else
: ${TMPDIR=/tmp}
fi
fi
export TMPDIR
# The user to search network directories as.
: ${NETUSER=daemon}
# The directory containing the subprograms.
if test -n "$LIBEXECDIR" ; then
: LIBEXECDIR already set, do nothing
else
: ${LIBEXECDIR=/usr/local/Cellar/findutils/4.7.0/libexec}
fi
# The directory containing find.
if test -n "$BINDIR" ; then
: BINDIR already set, do nothing
else
: ${BINDIR=/usr/local/Cellar/findutils/4.7.0/bin}
fi
# The names of the utilities to run to build the database.
: ${find:=${BINDIR}/gfind}
: ${frcode:=${LIBEXECDIR}/gfrcode}
make_tempdir () {
# This implementation is adapted from the GNU Autoconf manual.
{
tmp=`
(umask 077 && mktemp -d "$TMPDIR/updatedbXXXXXX") 2>/dev/null
` &&
test -n "$tmp" && test -d "$tmp"
} || {
# This method is less secure than mktemp -d, but it's a fallback.
#
# We use $$ as well as $RANDOM since $RANDOM may not be available.
# We also add a time-dependent suffix. This is actually somewhat
# predictable, but then so is $$. POSIX does not require date to
# support +%N.
ts=`date +%N%S || date +%S 2>/dev/null`
tmp="$TMPDIR"/updatedb"$$"-"${RANDOM:-}${ts}"
(umask 077 && mkdir "$tmp")
}
echo "$tmp"
}
checkbinary () {
if test -x "" ; then
: ok
else
eval echo "updatedb needs to be able to execute , but cannot." >&2
exit 1
fi
}
for binary in $find $frcode
do
checkbinary $binary
done
: ${PRUNEFS="
9P
NFS
afs
autofs
cifs
coda
devfs
devpts
ftpfs
iso9660
mfs
ncpfs
nfs
nfs4
proc
shfs
smbfs
sysfs
"}
if test -n "$PRUNEFS"; then
prunefs_exp=`echo $PRUNEFS |sed -e 's/\([^ ][^ ]*\)/-o -fstype /g' \
-e 's/-o //' -e 's/$/ -o/'`
else
prunefs_exp=''
fi
# Make and code the file list.
# Sort case insensitively for users' convenience.
rm -f $LOCATE_DB.n
trap 'rm -f $LOCATE_DB.n; exit' HUP TERM
if {
cd "$changeto"
if test -n "$SEARCHPATHS"; then
if [ "$LOCALUSER" != "" ]; then
# : A1
su $LOCALUSER `select_shell $LOCALUSER` -c \
"$find $SEARCHPATHS $FINDOPTIONS \
\( $prunefs_exp \
-type d -regex '$PRUNEREGEX' \) -prune -o $print_option"
else
# : A2
# ORIGINAL VERSION : sequential find
#$find $SEARCHPATHS $FINDOPTIONS \
# \( $prunefs_exp \
# -type d -regex "$PRUNEREGEX" \) -prune -o $print_option ::: /
# Parallel version 1
#parallel -j 32 $find $SEARCHPATHS $FINDOPTIONS \
# \( $prunefs_exp \
# -type d -regex "$PRUNEREGEX" \) -prune -o $print_option ::: /
# Parallel version 2
parallel -j 32 $find {} $FINDOPTIONS \
$prunefs_exp -type d -regex $PRUNEREGEX -prune -o $print_option ::: */*
fi
fi
if test -n "$NETPATHS"; then
myuid=`getuid`
if [ "$myuid" = 0 ]; then
# : A3
su $NETUSER `select_shell $NETUSER` -c \
"$find $NETPATHS $FINDOPTIONS \( -type d -regex '$PRUNEREGEX' -prune \) -o $print_option" ||
exit $?
else
# : A4
$find $NETPATHS $FINDOPTIONS \( -type d -regex "$PRUNEREGEX" -prune \) -o $print_option ||
exit $?
fi
fi
} | $sort | $frcode $frcode_options > $LOCATE_DB.n
then
: OK so far
true
else
rv=$?
echo "Failed to generate $LOCATE_DB.n" >&2
rm -f $LOCATE_DB.n
exit $rv
fi
# To avoid breaking locate while this script is running, put the
# results in a temp file, then rename it atomically.
if test -s $LOCATE_DB.n; then
chmod 644 ${LOCATE_DB}.n
mv ${LOCATE_DB}.n $LOCATE_DB
else
echo "updatedb: new database would be empty" >&2
rm -f $LOCATE_DB.n
fi
exit 0
我像这样启动 gupdatedb
命令:
sudo gupdatedb --prunepaths='/private/tmp /private/var/folders /private/var/tmp */Backups.backupdb /System /Volumes' --localpaths='/' --output=$HOME/locatedb_gupdatedb_PARALLEL
更新 4
我的赏金明天到期。使用默认 gupdatedb
,所有索引大约需要 30 分钟。如果我能够正确使用 parallel
和 gupdatedb
脚本的核心,即当后者使用 gfind
命令进行索引时,我可以期望哪个增益因子?
最后一个请求:如何修复错误:
/bin/sh: -c: line 0: syntax error near unexpected token `('
/bin/sh: -c: line 0: `/usr/local/Cellar/findutils/4.7.0/bin/gfind / / ( -fstype 9P -o -fstype NFS -o -fstype afs -o -fstype autofs -o -fstype cifs -o -fstype coda -o -fstype devfs -o -fstype devpts -o -fstype ftpfs -o -fstype iso9660 -o -fstype mfs -o -fstype ncpfs -o -fstype nfs -o -fstype nfs4 -o -fstype proc -o -fstype shfs -o -fstype smbfs -o -fstype sysfs -o -type d -regex \(^/private/tmp$\)\|\(^/private/var/folders$\)\|\(^/private/var/tmp$\)\|\(^*/Backups.backupdb$\)\|\(^/System$\)\|\(^/Volumes$\) ) -prune -o -print0'
使用命令:
parallel -j32 $find {} $FINDOPTIONS \
\( $prunefs_exp \
-type d -regex "$PRUNEREGEX" \) -prune -o $print_option ::: /
?
如果后面没有任何内容,则不需要 :::
,如果没有任何来源,{}
也毫无意义。如果没有关于您究竟想要并行化什么的更多信息,我们无法真正告诉您应该使用什么。
但是,例如,如果您想要 运行 在 /etc
、/usr
、/bin
和 [=18= 中的每个 find
],看起来像
parallel find {} -options ::: /etc /usr /bin /opt
不用:::
也可以这样表示:
printf '%s\n' /etc /usr /bin /opt |
parallel find {} -options
所以 :::
的目的基本上是说“我想在命令行上指定要并行化的东西,而不是在标准输入上接收它们”;但如果您不提供此信息,无论哪种方式,parallel
都不知道用什么替换 {}
。
我并不是说这种特殊用途对您的用例有意义,只是希望澄清文档 ()。
要通过使用并行获得任何有意义的加速,您需要确保您有资源来加快进程。这里有两个挑战:
- updatedb 进程受 IO 限制。通常,您使用并行来利用 multi-core 系统,并将 CPU 绑定进程分布在多个内核上。
- updatedb 进程需要独占访问数据库(通常在/var/lib/mlcoate/mlocate.db 中)。即使将 updatedb 拆分到多个内核中有任何好处,您也必须将输出放入多个数据库中。此方法将需要传递所有数据库名称(用“:”分隔以使用“-d”定位)
除非您的系统有多个磁盘驱动器(或者您正在访问网络驱动器),否则您从 运行ning 并行查找中获得的收益很少。
如果您的系统有多个磁盘驱动器(and/or 网络驱动器),您可以运行 每个文件系统并行,使用像
这样的脚本
假设您在 /mnt/disk1、/mnt/disk2
上安装了 2 个额外的磁盘
# Index root
updatedb --output=/var/lib/mlocate/local.db -E '/mnt/disk1 /mnt/disk2' &
# Index 1st extra disk (or network drive)
updatedb --output=/var/lib/mlocate/disk1.db -U /mnt/disk1 &
# Index 2nd extra disk (or network drive)
updatedb --output=/var/lib/mlocate/disk2.db -U /mnt/disk2 &
wait
您应该将环境变量LOCATE_PATH设置为指向所有数据库
导出
LOCATE_PATH=/var/lib/mlocate/local.db:/var/lib/mlocate/disk1.db:/var/lib/mlocate/disk2.db
locate ...
我跟前面的一样post
我想构建 gupdatedb 数据库,包含来自主根 /
的所有内容,但下面列出的 PRUNEPATHS
除外。我正在使用 MacOS 10.15 Catalina。
所以,我尝试修改 MacOS 10.15 上的 gupdatedb 脚本以从这样的 parallel
命令中受益(注意 # : A2
部分):
# : A2
cat | parallel -j32 $find {} $SEARCHPATHS $FINDOPTIONS \
\( $prunefs_exp -type d -regex "$PRUNEREGEX" \) \
-prune -o $print_option * :::
如果我不使用 cat |
,我会收到以下警告消息:
parallel: Warning: Input is read from the terminal. You are either an expert
parallel: Warning: (in which case: YOU ARE AWESOME!) or maybe you forgot
parallel: Warning: ::: or :::: or -a or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.
并且进程似乎挂起。
不幸的是,$find = gfind
的多个线程似乎无法同时 运行 :
我已经启动了这样的脚本:sudo time gupdatedb
及以下结果:ps aux | grep find
:
root 84865 0.0 0.0 4459044 15828 s002 S+ 1:43PM 0:00.10 perl /usr/local/bin/parallel -j32 /usr/local/Cellar/findutils/4.7.0/bin/gfind {} / ( -fstype 9P -o -fstype NFS -o -fstype afs -o -fstype autofs -o -fstype cifs -o -fstype coda -o -fstype devfs -o -fstype devpts -o -fstype ftpfs -o -fstype iso9660 -o -fstype mfs -o -fstype ncpfs -o -fstype nfs -o -fstype nfs4 -o -fstype proc -o -fstype shfs -o -fstype smbfs -o -fstype sysfs -o -type d -regex \(^/afs$\)\|\(^/amd$\)\|\(^/proc$\)\|\(^/sfs$\)\|\(^/tmp$\)\|\(^/usr/tmp$\)\|\(^/var/tmp$\)\|\(^/Volumes$\) ) -prune -o -print0 Applications Library System Users Volumes bin cores dev etc home opt private sbin tmp usr var :::
root 84863 0.0 0.0 4268280 796 s002 S+ 1:43PM 0:00.00 /usr/local/Cellar/findutils/4.7.0/libexec/gfrcode -0
root 84861 0.0 0.0 4282172 708 s002 S+ 1:43PM 0:00.00 /bin/sh /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
root 84853 0.0 0.0 4273980 1164 s002 S+ 1:43PM 0:00.01 /bin/sh /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
root 84850 0.0 0.0 5396228 10288 s008 S+ 1:43PM 0:00.27 vim /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
root 84849 0.0 0.0 4788896 6740 s008 S+ 1:43PM 0:00.03 sudo vim /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
最后,数据库可能无法构建,我正在检查 /usr/local/var/locate/locatedb.n
和 /usr/local/var/locate/locatedb
的大小,但没有任何变化。
我在 parallel 中使用的语法有什么问题? (特别是,我不知道如何处理命令的 ... ::: options
部分)
PS : 我设置在 gupdatedb
:
# Directories to not put in the database, which would otherwise be.
: ${PRUNEPATHS="
/afs
/amd
/proc
/sfs
/tmp
/usr/tmp
/var/tmp
/Volumes
"}
和
# You can set these in the environment, or use command-line options,
# to override their defaults:
# Any global options for find?
: ${FINDOPTIONS=}
# What shell shoud we use? We should use a POSIX-ish sh.
: ${SHELL="/bin/sh"}
# Non-network directories to put in the database.
: ${SEARCHPATHS="/"}
更新 1
更准确地说,这里是 post 我要求与 parallel/find
夫妇进行潜在优化(并行化)的地方:
我想做同样的优化,但针对脚本 gupdatedb
。
更新 2
我听从了 :
的建议进入 gupdatedb
关于我的问题的默认命令是:
$find $SEARCHPATHS $FINDOPTIONS \
\( $prunefs_exp \
-type d -regex "$PRUNEREGEX" \) -prune -o $print_option
所以,我刚刚修改成这样:
parallel -j32 $find {} $SEARCHPATHS $FINDOPTIONS \
\( $prunefs_exp \
-type d -regex "$PRUNEREGEX" \) -prune -o $print_option ::: /
我收到以下错误:
/bin/sh: -c: line 0: syntax error near unexpected token `('
/bin/sh: -c: line 0: `/usr/local/Cellar/findutils/4.7.0/bin/gfind / / ( -fstype 9P -o -fstype NFS -o -fstype afs -o -fstype autofs -o -fstype cifs -o -fstype coda -o -fstype devfs -o -fstype devpts -o -fstype ftpfs -o -fstype iso9660 -o -fstype mfs -o -fstype ncpfs -o -fstype nfs -o -fstype nfs4 -o -fstype proc -o -fstype shfs -o -fstype smbfs -o -fstype sysfs -o -type d -regex \(^/private/tmp$\)\|\(^/private/var/folders$\)\|\(^/private/var/tmp$\)\|\(^*/Backups.backupdb$\)\|\(^/System$\)\|\(^/Volumes$\) ) -prune -o -print0'
这里可能有什么问题?
更新 3
这里是脚本 gupdatedb
,您可以从第 300 行看到我的不同尝试:
#! /bin/sh
# updatedb -- build a locate pathname database
# Copyright (C) 1994-2019 Free Software Foundation, Inc.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
# csh original by James Woods; sh conversion by David MacKenzie.
#exec 2> /tmp/updatedb-trace.txt
#set -x
version='
updatedb (GNU findutils) 4.7.0
Copyright (C) 1994-2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Eric B. Decker, James Youngman, and Kevin Dalley.
'
# File path names are not actually text, anyway (since there is no
# mechanism to enforce any constraint that the basename of a
# subdirectory has the same character encoding as the basename of its
# parent). The practical effect is that, depending on the way a
# particular system is configured and the content of its filesystem,
# passing all the file names in the system through "sort" may generate
# character encoding errors in text-based tools like "sort". To avoid
# this, we set LC_ALL=C. This will, presumably, not work perfectly on
# systems where LC_ALL is not the way to do locale configuration or
# some other seting can override this.
LC_ALL=C
export LC_ALL
# We can't use substitution on PACKAGE_URL below because it
# (correctly) points to https://www.gnu.org/software/findutils/ instead
# of the bug reporting page.
usage="\
Usage: [=18=] [--findoptions='-option1 -option2...']
[--localpaths='dir1 dir2...'] [--netpaths='dir1 dir2...']
[--prunepaths='dir1 dir2...'] [--prunefs='fs1 fs2...']
[--output=dbfile] [--netuser=user] [--localuser=user]
[--dbformat] [--version] [--help]
Please see also the documentation at http://www.gnu.org/software/findutils/.
Report (and track progress on fixing) bugs in the updatedb
program via the GNU findutils bug-reporting page at
https://savannah.gnu.org/bugs/?group=findutils or, if
you have no web access, by sending email to <bug-findutils@gnu.org>.
"
changeto=/
for arg
do
# If we are unable to fork, the back-tick operator will
# fail (and the shell will emit an error message). When
# this happens, we exit with error value 71 (EX_OSERR).
# Alternative candidate - 75, EX_TEMPFAIL.
opt=`echo $arg|sed 's/^\([^=]*\).*//'` || exit 71
val=`echo $arg|sed 's/^[^=]*=\(.*\)//'` || exit 71
case "$opt" in
--findoptions) FINDOPTIONS="$val" ;;
--localpaths) SEARCHPATHS="$val" ;;
--netpaths) NETPATHS="$val" ;;
--prunepaths) PRUNEPATHS="$val" ;;
--prunefs) PRUNEFS="$val" ;;
--output) LOCATE_DB="$val" ;;
--netuser) NETUSER="$val" ;;
--localuser) LOCALUSER="$val" ;;
--changecwd) changeto="$val" ;;
--dbformat) dbformat="$val" ;;
--version) fail=0; echo "$version" || fail=1; exit $fail ;;
--help) fail=0; echo "$usage" || fail=1; exit $fail ;;
*) echo "updatedb: invalid option $opt
Try '[=18=] --help' for more information." >&2
exit 1 ;;
esac
done
frcode_options=""
case "$dbformat" in
"")
# Default, use LOCATE02
;;
LOCATE02)
;;
slocate)
frcode_options="$frcode_options -S 1"
;;
*)
# The "old" database format is no longer supported.
echo "Unsupported locate database format ${dbformat}: Supported formats are:" >&2
echo "LOCATE02, slocate" >&2
exit 1
esac
if true
then
sort="/usr/bin/sort -z"
print_option="-print0"
frcode_options="$frcode_options -0"
else
sort="/usr/bin/sort"
print_option="-print"
fi
getuid() {
# format of "id" output is ...
# uid=1(daemon) gid=1(other)
# for `id's that don't understand -u
id | cut -d'(' -f 1 | cut -d'=' -f2
}
# figure out if su supports the -s option
select_shell() {
if su "" -s $SHELL -c false < /dev/null ; then
# No.
echo ""
else
if su "" -s $SHELL -c true < /dev/null ; then
# Yes.
echo "-s $SHELL"
else
# su is unconditionally failing. We won't be able to
# figure out what is wrong, so be conservative.
echo ""
fi
fi
}
# You can set these in the environment, or use command-line options,
# to override their defaults:
# Any global options for find?
: ${FINDOPTIONS="-mindepth 1 -maxdepth 1"}
#: ${FINDOPTIONS=""}
# What shell shoud we use? We should use a POSIX-ish sh.
: ${SHELL="/bin/sh"}
# Non-network directories to put in the database.
: ${SEARCHPATHS="/"}
# Network (NFS, AFS, RFS, etc.) directories to put in the database.
: ${NETPATHS=}
# Directories to not put in the database, which would otherwise be.
: ${PRUNEPATHS="
/afs
/amd
/proc
/sfs
/tmp
/usr/tmp
/var/tmp
"}
# Trailing slashes result in regex items that are never matched, which
# is not what the user will expect. Therefore we now reject such
# constructs.
for p in $PRUNEPATHS; do
case "$p" in
/*/) echo "[=18=]: $p: pruned paths should not contain trailing slashes" >&2
exit 1
esac
done
# The same, in the form of a regex that find can use.
test -z "$PRUNEREGEX" &&
PRUNEREGEX=`echo $PRUNEPATHS|sed -e 's,^,\\(^,' -e 's, ,$\\)\\|\\(^,g' -e 's,$,$\\),'`
# The database file to build.
: ${LOCATE_DB=/usr/local/var/locate/locatedb}
# Directory to hold intermediate files.
if test -z "$TMPDIR"; then
if test -d /var/tmp; then
: ${TMPDIR=/var/tmp}
elif test -d /usr/tmp; then
: ${TMPDIR=/usr/tmp}
else
: ${TMPDIR=/tmp}
fi
fi
export TMPDIR
# The user to search network directories as.
: ${NETUSER=daemon}
# The directory containing the subprograms.
if test -n "$LIBEXECDIR" ; then
: LIBEXECDIR already set, do nothing
else
: ${LIBEXECDIR=/usr/local/Cellar/findutils/4.7.0/libexec}
fi
# The directory containing find.
if test -n "$BINDIR" ; then
: BINDIR already set, do nothing
else
: ${BINDIR=/usr/local/Cellar/findutils/4.7.0/bin}
fi
# The names of the utilities to run to build the database.
: ${find:=${BINDIR}/gfind}
: ${frcode:=${LIBEXECDIR}/gfrcode}
make_tempdir () {
# This implementation is adapted from the GNU Autoconf manual.
{
tmp=`
(umask 077 && mktemp -d "$TMPDIR/updatedbXXXXXX") 2>/dev/null
` &&
test -n "$tmp" && test -d "$tmp"
} || {
# This method is less secure than mktemp -d, but it's a fallback.
#
# We use $$ as well as $RANDOM since $RANDOM may not be available.
# We also add a time-dependent suffix. This is actually somewhat
# predictable, but then so is $$. POSIX does not require date to
# support +%N.
ts=`date +%N%S || date +%S 2>/dev/null`
tmp="$TMPDIR"/updatedb"$$"-"${RANDOM:-}${ts}"
(umask 077 && mkdir "$tmp")
}
echo "$tmp"
}
checkbinary () {
if test -x "" ; then
: ok
else
eval echo "updatedb needs to be able to execute , but cannot." >&2
exit 1
fi
}
for binary in $find $frcode
do
checkbinary $binary
done
: ${PRUNEFS="
9P
NFS
afs
autofs
cifs
coda
devfs
devpts
ftpfs
iso9660
mfs
ncpfs
nfs
nfs4
proc
shfs
smbfs
sysfs
"}
if test -n "$PRUNEFS"; then
prunefs_exp=`echo $PRUNEFS |sed -e 's/\([^ ][^ ]*\)/-o -fstype /g' \
-e 's/-o //' -e 's/$/ -o/'`
else
prunefs_exp=''
fi
# Make and code the file list.
# Sort case insensitively for users' convenience.
rm -f $LOCATE_DB.n
trap 'rm -f $LOCATE_DB.n; exit' HUP TERM
if {
cd "$changeto"
if test -n "$SEARCHPATHS"; then
if [ "$LOCALUSER" != "" ]; then
# : A1
su $LOCALUSER `select_shell $LOCALUSER` -c \
"$find $SEARCHPATHS $FINDOPTIONS \
\( $prunefs_exp \
-type d -regex '$PRUNEREGEX' \) -prune -o $print_option"
else
# : A2
# ORIGINAL VERSION : sequential find
#$find $SEARCHPATHS $FINDOPTIONS \
# \( $prunefs_exp \
# -type d -regex "$PRUNEREGEX" \) -prune -o $print_option ::: /
# Parallel version 1
#parallel -j 32 $find $SEARCHPATHS $FINDOPTIONS \
# \( $prunefs_exp \
# -type d -regex "$PRUNEREGEX" \) -prune -o $print_option ::: /
# Parallel version 2
parallel -j 32 $find {} $FINDOPTIONS \
$prunefs_exp -type d -regex $PRUNEREGEX -prune -o $print_option ::: */*
fi
fi
if test -n "$NETPATHS"; then
myuid=`getuid`
if [ "$myuid" = 0 ]; then
# : A3
su $NETUSER `select_shell $NETUSER` -c \
"$find $NETPATHS $FINDOPTIONS \( -type d -regex '$PRUNEREGEX' -prune \) -o $print_option" ||
exit $?
else
# : A4
$find $NETPATHS $FINDOPTIONS \( -type d -regex "$PRUNEREGEX" -prune \) -o $print_option ||
exit $?
fi
fi
} | $sort | $frcode $frcode_options > $LOCATE_DB.n
then
: OK so far
true
else
rv=$?
echo "Failed to generate $LOCATE_DB.n" >&2
rm -f $LOCATE_DB.n
exit $rv
fi
# To avoid breaking locate while this script is running, put the
# results in a temp file, then rename it atomically.
if test -s $LOCATE_DB.n; then
chmod 644 ${LOCATE_DB}.n
mv ${LOCATE_DB}.n $LOCATE_DB
else
echo "updatedb: new database would be empty" >&2
rm -f $LOCATE_DB.n
fi
exit 0
我像这样启动 gupdatedb
命令:
sudo gupdatedb --prunepaths='/private/tmp /private/var/folders /private/var/tmp */Backups.backupdb /System /Volumes' --localpaths='/' --output=$HOME/locatedb_gupdatedb_PARALLEL
更新 4
我的赏金明天到期。使用默认 gupdatedb
,所有索引大约需要 30 分钟。如果我能够正确使用 parallel
和 gupdatedb
脚本的核心,即当后者使用 gfind
命令进行索引时,我可以期望哪个增益因子?
最后一个请求:如何修复错误:
/bin/sh: -c: line 0: syntax error near unexpected token `('
/bin/sh: -c: line 0: `/usr/local/Cellar/findutils/4.7.0/bin/gfind / / ( -fstype 9P -o -fstype NFS -o -fstype afs -o -fstype autofs -o -fstype cifs -o -fstype coda -o -fstype devfs -o -fstype devpts -o -fstype ftpfs -o -fstype iso9660 -o -fstype mfs -o -fstype ncpfs -o -fstype nfs -o -fstype nfs4 -o -fstype proc -o -fstype shfs -o -fstype smbfs -o -fstype sysfs -o -type d -regex \(^/private/tmp$\)\|\(^/private/var/folders$\)\|\(^/private/var/tmp$\)\|\(^*/Backups.backupdb$\)\|\(^/System$\)\|\(^/Volumes$\) ) -prune -o -print0'
使用命令:
parallel -j32 $find {} $FINDOPTIONS \
\( $prunefs_exp \
-type d -regex "$PRUNEREGEX" \) -prune -o $print_option ::: /
?
如果后面没有任何内容,则不需要 :::
,如果没有任何来源,{}
也毫无意义。如果没有关于您究竟想要并行化什么的更多信息,我们无法真正告诉您应该使用什么。
但是,例如,如果您想要 运行 在 /etc
、/usr
、/bin
和 [=18= 中的每个 find
],看起来像
parallel find {} -options ::: /etc /usr /bin /opt
不用:::
也可以这样表示:
printf '%s\n' /etc /usr /bin /opt |
parallel find {} -options
所以 :::
的目的基本上是说“我想在命令行上指定要并行化的东西,而不是在标准输入上接收它们”;但如果您不提供此信息,无论哪种方式,parallel
都不知道用什么替换 {}
。
我并不是说这种特殊用途对您的用例有意义,只是希望澄清文档 (
要通过使用并行获得任何有意义的加速,您需要确保您有资源来加快进程。这里有两个挑战:
- updatedb 进程受 IO 限制。通常,您使用并行来利用 multi-core 系统,并将 CPU 绑定进程分布在多个内核上。
- updatedb 进程需要独占访问数据库(通常在/var/lib/mlcoate/mlocate.db 中)。即使将 updatedb 拆分到多个内核中有任何好处,您也必须将输出放入多个数据库中。此方法将需要传递所有数据库名称(用“:”分隔以使用“-d”定位)
除非您的系统有多个磁盘驱动器(或者您正在访问网络驱动器),否则您从 运行ning 并行查找中获得的收益很少。
如果您的系统有多个磁盘驱动器(and/or 网络驱动器),您可以运行 每个文件系统并行,使用像
这样的脚本假设您在 /mnt/disk1、/mnt/disk2
上安装了 2 个额外的磁盘 # Index root
updatedb --output=/var/lib/mlocate/local.db -E '/mnt/disk1 /mnt/disk2' &
# Index 1st extra disk (or network drive)
updatedb --output=/var/lib/mlocate/disk1.db -U /mnt/disk1 &
# Index 2nd extra disk (or network drive)
updatedb --output=/var/lib/mlocate/disk2.db -U /mnt/disk2 &
wait
您应该将环境变量LOCATE_PATH设置为指向所有数据库 导出
LOCATE_PATH=/var/lib/mlocate/local.db:/var/lib/mlocate/disk1.db:/var/lib/mlocate/disk2.db
locate ...