将GNU并行命令与gfind结合使用以获取gupdatedb工具的运行时

如何解决将GNU并行命令与gfind结合使用以获取gupdatedb工具的运行时

我关注上一篇文章combine parallel and gfind

我想构建gupdatedb数据库,该数据库包含来自主根/的所有数据库,但下面列出的PRUNEPATHS除外。我正在MacOS 10.15 Catalina上工作。

因此,我尝试在MacOS 10.15上修改gupdatedb脚本,以从parallel命令中受益(请注意# : A2部分):

# : A2
cat | parallel -j32 $find {} $SEARCHPATHS $FINDOPTIONS \
    \( $prunefs_exp -type d -regex "$PRUNEREGEX" \) \
    -prune -o $print_option * :::

如果我不使用cat |,则会收到以下警告消息:

parallel: Warning: Input is read from the terminal. You are either an expert
parallel: Warning: (in which case: YOU ARE AWESOME!) or maybe you forgot
parallel: Warning: ::: or :::: or -a or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.

并且该过程似乎挂起。

不幸的是,$find = gfind的多个线程似乎不能同时运行:

我已经启动了这样的脚本:sudo time gupdatedb

并在以下结果以下:ps aux | grep find

root             84865   0.0  0.0  4459044  15828 s002  S+    1:43PM   0:00.10 perl /usr/local/bin/parallel -j32 /usr/local/Cellar/findutils/4.7.0/bin/gfind {} / ( -fstype 9P -o -fstype NFS -o -fstype afs -o -fstype autofs -o -fstype cifs -o -fstype coda -o -fstype devfs -o -fstype devpts -o -fstype ftpfs -o -fstype iso9660 -o -fstype mfs -o -fstype ncpfs -o -fstype nfs -o -fstype nfs4 -o -fstype proc -o -fstype shfs -o -fstype smbfs -o -fstype sysfs -o -type d -regex \(^/afs$\)\|\(^/amd$\)\|\(^/proc$\)\|\(^/sfs$\)\|\(^/tmp$\)\|\(^/usr/tmp$\)\|\(^/var/tmp$\)\|\(^/Volumes$\) ) -prune -o -print0 Applications Library System Users Volumes bin cores dev etc home opt private sbin tmp usr var :::
root             84863   0.0  0.0  4268280    796 s002  S+    1:43PM   0:00.00 /usr/local/Cellar/findutils/4.7.0/libexec/gfrcode -0
root             84861   0.0  0.0  4282172    708 s002  S+    1:43PM   0:00.00 /bin/sh /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
root             84853   0.0  0.0  4273980   1164 s002  S+    1:43PM   0:00.01 /bin/sh /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
root             84850   0.0  0.0  5396228  10288 s008  S+    1:43PM   0:00.27 vim /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb
root             84849   0.0  0.0  4788896   6740 s008  S+    1:43PM   0:00.03 sudo vim /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb

最后,可能无法构建数据库,我正在检查/usr/local/var/locate/locatedb.n/usr/local/var/locate/locatedb的大小,但没有任何变化。

与parallel一起使用的语法有什么问题? (特别是,我不知道如何处理命令的... ::: options部分)

PS:我已设置为gupdatedb

# Directories to not put in the database,which would otherwise be.
: ${PRUNEPATHS="
/afs
/amd
/proc
/sfs
/tmp
/usr/tmp
/var/tmp
/Volumes
"}

# You can set these in the environment,or use command-line options,# to override their defaults:

# Any global options for find?
: ${FINDOPTIONS=}

# What shell shoud we use?  We should use a POSIX-ish sh.
: ${SHELL="/bin/sh"}

# Non-network directories to put in the database.
: ${SEARCHPATHS="/"}

更新1

更准确地说,这是我要求与parallel/find对夫妇进行潜在优化(并行化)的帖子:

example of a potential parallelization with coupled parallel/find

除了脚本gupdatedb,我想做同样的优化。

更新2

我遵循了:

的建议

关于我的问题,进入gupdatedb的defaut命令是:

$find $SEARCHPATHS $FINDOPTIONS \
 \( $prunefs_exp \
 -type d -regex "$PRUNEREGEX" \) -prune -o $print_option

所以,我刚刚做了如下修改:

parallel -j32 $find {} $SEARCHPATHS $FINDOPTIONS \
    \( $prunefs_exp \
    -type d -regex "$PRUNEREGEX" \) -prune -o $print_option ::: /

,我收到以下错误消息:

/bin/sh: -c: line 0: syntax error near unexpected token `('
/bin/sh: -c: line 0: `/usr/local/Cellar/findutils/4.7.0/bin/gfind / / ( -fstype 9P -o -fstype NFS -o -fstype afs -o -fstype autofs -o -fstype cifs -o -fstype coda -o -fstype devfs -o -fstype devpts -o -fstype ftpfs -o -fstype iso9660 -o -fstype mfs -o -fstype ncpfs -o -fstype nfs -o -fstype nfs4 -o -fstype proc -o -fstype shfs -o -fstype smbfs -o -fstype sysfs -o -type d -regex \(^/private/tmp$\)\|\(^/private/var/folders$\)\|\(^/private/var/tmp$\)\|\(^*/Backups.backupdb$\)\|\(^/System$\)\|\(^/Volumes$\) ) -prune -o -print0'

这可能是什么问题?

更新3:

这里是脚本gupdatedb,您可以从第300行看到我的不同尝试:

#! /bin/sh
# updatedb -- build a locate pathname database
# Copyright (C) 1994-2019 Free Software Foundation,Inc.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation,either version 3 of the License,or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not,see <https://www.gnu.org/licenses/>.

# csh original by James Woods; sh conversion by David MacKenzie.

#exec 2> /tmp/updatedb-trace.txt
#set -x

version='
updatedb (GNU findutils) 4.7.0
Copyright (C) 1994-2019 Free Software Foundation,Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY,to the extent permitted by law.

Written by Eric B. Decker,James Youngman,and Kevin Dalley.
'

# File path names are not actually text,anyway (since there is no
# mechanism to enforce any constraint that the basename of a
# subdirectory has the same character encoding as the basename of its
# parent).  The practical effect is that,depending on the way a
# particular system is configured and the content of its filesystem,# passing all the file names in the system through "sort" may generate
# character encoding errors in text-based tools like "sort".  To avoid
# this,we set LC_ALL=C.  This will,presumably,not work perfectly on
# systems where LC_ALL is not the way to do locale configuration or
# some other seting can override this.
LC_ALL=C
export LC_ALL

# We can't use substitution on PACKAGE_URL below because it
# (correctly) points to https://www.gnu.org/software/findutils/ instead
# of the bug reporting page.
usage="\
Usage: $0 [--findoptions='-option1 -option2...']
       [--localpaths='dir1 dir2...'] [--netpaths='dir1 dir2...']
       [--prunepaths='dir1 dir2...'] [--prunefs='fs1 fs2...']
       [--output=dbfile] [--netuser=user] [--localuser=user]
       [--dbformat] [--version] [--help]

Please see also the documentation at http://www.gnu.org/software/findutils/.
Report (and track progress on fixing) bugs in the updatedb
program via the GNU findutils bug-reporting page at
https://savannah.gnu.org/bugs/?group=findutils or,if
you have no web access,by sending email to <bug-findutils@gnu.org>.
"
changeto=/

for arg
do
  # If we are unable to fork,the back-tick operator will
  # fail (and the shell will emit an error message).  When
  # this happens,we exit with error value 71 (EX_OSERR).
  # Alternative candidate - 75,EX_TEMPFAIL.
  opt=`echo $arg|sed 's/^\([^=]*\).*/\1/'`  || exit 71
  val=`echo $arg|sed 's/^[^=]*=\(.*\)/\1/'` || exit 71
  case "$opt" in
    --findoptions) FINDOPTIONS="$val" ;;
    --localpaths) SEARCHPATHS="$val" ;;
    --netpaths) NETPATHS="$val" ;;
    --prunepaths) PRUNEPATHS="$val" ;;
    --prunefs) PRUNEFS="$val" ;;
    --output) LOCATE_DB="$val" ;;
    --netuser) NETUSER="$val" ;;
    --localuser) LOCALUSER="$val" ;;
    --changecwd)  changeto="$val" ;;
    --dbformat)   dbformat="$val" ;;
    --version) fail=0; echo "$version" || fail=1; exit $fail ;;
    --help)    fail=0; echo "$usage"   || fail=1; exit $fail ;;
    *) echo "updatedb: invalid option $opt
Try '$0 --help' for more information." >&2
       exit 1 ;;
  esac
done

frcode_options=""
case "$dbformat" in
    "")
        # Default,use LOCATE02
        ;;
    LOCATE02)
        ;;
    slocate)
        frcode_options="$frcode_options -S 1"
        ;;
    *)
        # The "old" database format is no longer supported.
        echo "Unsupported locate database format ${dbformat}: Supported formats are:" >&2
        echo "LOCATE02,slocate" >&2
        exit 1
esac


if true
then
    sort="/usr/bin/sort -z"
    print_option="-print0"
    frcode_options="$frcode_options -0"
else
    sort="/usr/bin/sort"
    print_option="-print"
fi

getuid() {
    # format of "id" output is ...
    # uid=1(daemon) gid=1(other)
    # for `id's that don't understand -u
    id | cut -d'(' -f 1 | cut -d'=' -f2
}

# figure out if su supports the -s option
select_shell() {
    if su "$1" -s $SHELL -c false < /dev/null  ; then
    # No.
    echo ""
    else
    if su "$1" -s $SHELL -c true < /dev/null  ; then
        # Yes.
        echo "-s $SHELL"
        else
        # su is unconditionally failing.  We won't be able to
        # figure out what is wrong,so be conservative.
        echo ""
    fi
    fi
}


# You can set these in the environment,# to override their defaults:

# Any global options for find?
: ${FINDOPTIONS="-mindepth 1 -maxdepth 1"}
#: ${FINDOPTIONS=""}

# What shell shoud we use?  We should use a POSIX-ish sh.
: ${SHELL="/bin/sh"}

# Non-network directories to put in the database.
: ${SEARCHPATHS="/"}

# Network (NFS,AFS,RFS,etc.) directories to put in the database.
: ${NETPATHS=}

# Directories to not put in the database,which would otherwise be.
: ${PRUNEPATHS="
/afs
/amd
/proc
/sfs
/tmp
/usr/tmp
/var/tmp
"}

# Trailing slashes result in regex items that are never matched,which
# is not what the user will expect.   Therefore we now reject such
# constructs.
for p in $PRUNEPATHS; do
    case "$p" in
    /*/)   echo "$0: $p: pruned paths should not contain trailing slashes" >&2
           exit 1
    esac
done

# The same,in the form of a regex that find can use.
test -z "$PRUNEREGEX" &&
  PRUNEREGEX=`echo $PRUNEPATHS|sed -e 's,^,\\\(^,' -e 's,$\\\)\\\|\\\(^,g' -e 's,$,$\\\),'`

# The database file to build.
: ${LOCATE_DB=/usr/local/var/locate/locatedb}

# Directory to hold intermediate files.
if test -z "$TMPDIR"; then
  if test -d /var/tmp; then
    : ${TMPDIR=/var/tmp}
  elif test -d /usr/tmp; then
    : ${TMPDIR=/usr/tmp}
  else
    : ${TMPDIR=/tmp}
  fi
fi
export TMPDIR

# The user to search network directories as.
: ${NETUSER=daemon}

# The directory containing the subprograms.
if test -n "$LIBEXECDIR" ; then
    : LIBEXECDIR already set,do nothing
else
    : ${LIBEXECDIR=/usr/local/Cellar/findutils/4.7.0/libexec}
fi

# The directory containing find.
if test -n "$BINDIR" ; then
    : BINDIR already set,do nothing
else
    : ${BINDIR=/usr/local/Cellar/findutils/4.7.0/bin}
fi

# The names of the utilities to run to build the database.
: ${find:=${BINDIR}/gfind}
: ${frcode:=${LIBEXECDIR}/gfrcode}

make_tempdir () {
    # This implementation is adapted from the GNU Autoconf manual.
    {
        tmp=`
    (umask 077 && mktemp -d "$TMPDIR/updatedbXXXXXX") 2>/dev/null
    ` &&
        test -n "$tmp" && test -d "$tmp"
    } || {
    # This method is less secure than mktemp -d,but it's a fallback.
    #
    # We use $$ as well as $RANDOM since $RANDOM may not be available.
    # We also add a time-dependent suffix.  This is actually somewhat
    # predictable,but then so is $$.  POSIX does not require date to
    # support +%N.
    ts=`date +%N%S || date +%S 2>/dev/null`
        tmp="$TMPDIR"/updatedb"$$"-"${RANDOM:-}${ts}"
        (umask 077 && mkdir "$tmp")
    }
    echo "$tmp"
}

checkbinary () {
    if test -x "$1" ; then
    : ok
    else
      eval echo "updatedb needs to be able to execute $1,but cannot." >&2
      exit 1
    fi
}

for binary in $find $frcode
do
  checkbinary $binary
done


: ${PRUNEFS="
9P
NFS
afs
autofs
cifs
coda
devfs
devpts
ftpfs
iso9660
mfs
ncpfs
nfs
nfs4
proc
shfs
smbfs
sysfs
"}

if test -n "$PRUNEFS"; then
prunefs_exp=`echo $PRUNEFS |sed -e 's/\([^ ][^ ]*\)/-o -fstype \1/g' \
 -e 's/-o //' -e 's/$/ -o/'`
else
  prunefs_exp=''
fi

# Make and code the file list.
# Sort case insensitively for users' convenience.

rm -f $LOCATE_DB.n
trap 'rm -f $LOCATE_DB.n; exit' HUP TERM

if {
cd "$changeto"
if test -n "$SEARCHPATHS"; then
  if [ "$LOCALUSER" != "" ]; then
    # : A1
    su $LOCALUSER `select_shell $LOCALUSER` -c \
    "$find $SEARCHPATHS $FINDOPTIONS \
     \\( $prunefs_exp \
     -type d -regex '$PRUNEREGEX' \\) -prune -o $print_option"
  else
    # : A2
    # ORIGINAL VERSION : sequential find
    #$find $SEARCHPATHS $FINDOPTIONS \
    # \( $prunefs_exp \
    # -type d -regex "$PRUNEREGEX" \) -prune -o $print_option ::: /

    # Parallel version 1
    #parallel -j 32 $find $SEARCHPATHS $FINDOPTIONS \
    # \( $prunefs_exp \
    # -type d -regex "$PRUNEREGEX" \) -prune -o $print_option ::: /
    
    # Parallel version 2
    parallel -j 32 $find {} $FINDOPTIONS \
    $prunefs_exp -type d -regex $PRUNEREGEX -prune -o $print_option ::: */*
  fi
fi

if test -n "$NETPATHS"; then
myuid=`getuid`
if [ "$myuid" = 0 ]; then
    # : A3
    su $NETUSER `select_shell $NETUSER` -c \
     "$find $NETPATHS $FINDOPTIONS \\( -type d -regex '$PRUNEREGEX' -prune \\) -o $print_option" ||
    exit $?
  else
    # : A4
    $find $NETPATHS $FINDOPTIONS \( -type d -regex "$PRUNEREGEX" -prune \) -o $print_option ||
    exit $?
  fi
fi
} | $sort | $frcode $frcode_options > $LOCATE_DB.n
then
    : OK so far
    true
else
    rv=$?
    echo "Failed to generate $LOCATE_DB.n" >&2
    rm -f $LOCATE_DB.n
    exit $rv
fi

# To avoid breaking locate while this script is running,put the
# results in a temp file,then rename it atomically.
if test -s $LOCATE_DB.n; then
  chmod 644 ${LOCATE_DB}.n
  mv ${LOCATE_DB}.n $LOCATE_DB
else
  echo "updatedb: new database would be empty" >&2
  rm -f $LOCATE_DB.n
fi

exit 0

我像这样启动gupdatedb命令:

sudo gupdatedb --prunepaths='/private/tmp /private/var/folders /private/var/tmp */Backups.backupdb /System /Volumes' --localpaths='/' --output=$HOME/locatedb_gupdatedb_PARALLEL

更新4:

我的赏金明天到期。使用默认的gupdatedb,所有索引编制大约需要30分钟。如果我可以使用parallel脚本的核心正确使用gupdatedb,即当后者使用gfind命令进行索引时,我可以期望哪个增益因子?

最后一个请求:如何解决错误:

/bin/sh: -c: line 0: syntax error near unexpected token `('
/bin/sh: -c: line 0: `/usr/local/Cellar/findutils/4.7.0/bin/gfind / / ( -fstype 9P -o -fstype NFS -o -fstype afs -o -fstype autofs -o -fstype cifs -o -fstype coda -o -fstype devfs -o -fstype devpts -o -fstype ftpfs -o -fstype iso9660 -o -fstype mfs -o -fstype ncpfs -o -fstype nfs -o -fstype nfs4 -o -fstype proc -o -fstype shfs -o -fstype smbfs -o -fstype sysfs -o -type d -regex \(^/private/tmp$\)\|\(^/private/var/folders$\)\|\(^/private/var/tmp$\)\|\(^*/Backups.backupdb$\)\|\(^/System$\)\|\(^/Volumes$\) ) -prune -o -print0'

使用命令:

parallel -j32 $find {} $FINDOPTIONS \
    \( $prunefs_exp \
    -type d -regex "$PRUNEREGEX" \) -prune -o $print_option ::: /

解决方法

如果没有任何内容,则不需要:::,如果没有任何来源,{}也毫无意义。如果没有更多关于您真正想要并行化的内容的信息,我们就无法真正告诉您应该使用什么。

但是,例如,如果要在find/etc/usr/bin中分别运行一个/opt,则看起来像

parallel find {} -options ::: /etc /usr /bin /opt

这可以等效地表示为没有:::

printf '%s\n' /etc /usr /bin /opt |
parallel find {} -options

所以:::的目的基本上是说“我想指定要在命令行上并行化的内容,而不是在标准输入上接收它们”;但是,无论哪种方式,如果您都不提供此信息,parallel将不知道将{}替换为什么。

我并不是说这种特殊用法对您的用例有意义,只是希望澄清文档(again)。

,

要从并行使用中获得任何有意义的加速,您需要确保有足够的资源来使过程更快。这里有两个挑战:

  1. updatedb进程受IO约束。通常,您使用并行来利用多核系统的优势,并将CPU绑定进程分散到多个核上。
  2. updatedb进程需要对数据库的独占访问权(通常在/var/lib/mlcoate/mlocate.db中)。即使将updateb拆分到多个内核中可以获得任何好处,您也必须将输出放入多个数据库中。这种方法将需要传递所有数据库名称(以“:”分隔,以“ -d”定位)

除非您的系统具有多个磁盘驱动器(或正在访问网络驱动器),否则运行并行查找将不会带来什么好处。

如果您的系统具有多个磁盘驱动器(和/或网络驱动器),则可以使用类似的脚本来并行运行每个文件系统

假设您在/ mnt / disk1,/ mnt / disk2上另外装有2个磁盘

  # Index root
updatedb --output=/var/lib/mlocate/local.db -E '/mnt/disk1 /mnt/disk2' &
  # Index 1st extra disk (or network drive)
updatedb --output=/var/lib/mlocate/disk1.db -U /mnt/disk1 &
  # Index 2nd extra disk (or network drive)
updatedb --output=/var/lib/mlocate/disk2.db -U /mnt/disk2 &
wait

您应该将环境变量LOCATE_PATH设置为指向所有数据库 导出

LOCATE_PATH=/var/lib/mlocate/local.db:/var/lib/mlocate/disk1.db:/var/lib/mlocate/disk2.db
locate ...

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-