我的笔记本电脑上有一堆上传的 .root 文件，但我只需要特定的文件

Question

我有一个包含 10000 个 .root 文件（每个看起来像 hists11524_blinded.root 或 hists9899_blinded.root）的目录，并且需要运行一些宏用于我的数据分析目的.但是，我不需要所有文件（总共只有 4000 个）都在目录中。我在 thebest.txt file 中有一个需要运行的列表（这 4000 个数字）。该文件也在带有直方图的目录中。

我想使用 .txt 文件中的信息删除运行ning 宏之前处理不需要的文件。

这就是 thebest.txt 文件的样子：

我的猜测是使用命令：

-comm -2 -3 <(ls) <(sort thebest) | tail +2 | xargs -p rm

我得到 2 个错误：

tail: invalid option -- 'p'

sort: cannot read: No such file or directory

文件 thebest.txt 仅包含 5 位数字，如 09999 或 11256，目录包含名称如 hists9999_blinded.root 或 hists11256_blinded.root 的文件。

两个列表中的位数不同 - 这是主要问题。

Answer 1

一个选项是从数字中删除前导 0 以匹配文件名。为避免匹配子字符串，您可以预先添加和附加相应的文件名部分。（在您的情况下，文件名中间的数字。）

由于尚不清楚示例文件 thebest.txt 中的前导空格是有意为之还是只是格式问题，因此前导空格也将被删除。

由于删除错误的文件可能会导致数据丢失，您也可以考虑只处理匹配的文件，而不是删除 non-matching 个文件。

# remove leading spaces followed by leading zeros and prepend/append file name parts
sed 's/ *0*\([1-9][0-9]*\)/hists_blinded.root/' thebest.txt > thebestfiles.txt

# get matching files and process
find . -name 'hists*_blinded.root' | fgrep -f thebestfiles.txt | xargs process_matching

# or get non-matching files and remove
find . -name 'hists*_blinded.root' | fgrep -v -f thebestfiles.txt | xargs rm

find命令在当前目录中递归搜索。如果你想排除子目录，你可以使用 -maxdepth 1。为避免处理目录名称，您还可以添加 -type f.

我的笔记本电脑上有一堆上传的 .root 文件，但我只需要特定的文件

I have a bunch of uploaded .root files on my laptop, but I need just specific ones

linux

histogram

root-framework