bash 操作文件中的多个字符串

bash manipulate multiple strings in a file

我想从文件名中删除以下输入行,我正在使用这个文件:

cat <<EOF >./tz.txt
2019/12/_MG_0263.CR2.xmp:           bopt:keywordlist="pinhole,car,2019"
2019/12/_MG_0262.CR2.xmp:           bopt:keywordlist="pinhole,car,2019"
2020/06/ok/_MG_0003.CR2.xmp:           bopt:keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/_MG_0002.CR2.xmp:           bopt:keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/_MG_0137.CR2.xmp:           bopt:keywordlist="red,car,2020"
2020/04/_MG_0136.CR2.xmp:           bopt:keywordlist="red,car,2020"
2020/04/_MG_0136.CR2.xmp:           bopt:keywordlist="red,car,2020"
EOF

现在我正在使用下面的脚本(存储在文件 ab.sh 中)来排除 [filename.xmp: bopt:](例如 _MG_0263.CR2.xmp : bopt :) 从每一行开始,这样输出看起来像这样:

2019/12/ keywordlist="pinhole,car,2019"
2019/12/ keywordlist="pinhole,car,2019"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"

以上是完整的预期输出。某些文件夹可能具有不同的结构,例如 2020/06/ok/

脚本代码如下:

#!/bin/bash
file="./tz.txt"
while read line ; do
        # variable a generates the folter structure with a variable range of considered columns
        # using awk to figure out how many columns (aka folders) there are in the structure
        a=$( cut -d"/" -f 1-$( awk -F'/' '{ print NF-1 }' $line ) $line )
    #                       |                                   |
    #                       -this bit should create a number for- 
    #                       -the cut command                    -
    
    #   then b variable stores the last bit in the  string
        b=$( cut -d":" -f 3 $line )
    
    #   and below combine results from above variables 
        echo ${a} ${b}
    done < ${file}

所附图片说明了用于将字符串拆分成列并仅获取相关数据的逻辑。

问题是我收到以下错误,但我不确定哪里出错了。 感谢您的任何建议或帮助

$ sh ~/ab.sh
awk: fatal: cannot open file `2019/12/_MG_0263.CR2.xmp:' for 

reading (No such file or directory)
cut: '2019/12/_MG_0263.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
cut: '2019/12/_MG_0263.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory

awk: fatal: cannot open file `2019/12/_MG_0262.CR2.xmp:' for reading (No such file or directory)
cut: '2019/12/_MG_0262.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
cut: '2019/12/_MG_0262.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory

awk: fatal: cannot open file `2020/06/ok/_MG_0003.CR2.xmp:' for reading (No such file or directory)
cut: '2020/06/ok/_MG_0003.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="lowkey,car,Chiaroscuro,2020"': No such file or directory
cut: '2020/06/ok/_MG_0003.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="lowkey,car,Chiaroscuro,2020"': No such file or directory

....

一个 awk 替换 while 循环的想法:

awk -F':' '
{ gsub(/[^/]+$/,"",)     # strip everything after last "/" from 1st field
  print , 
}' "${file}"

# or as a one-liner sans comments:

awk -F':' '{gsub(/[^/]+$/,"",); print , }' "${file}"

这会生成:

2019/12/ keywordlist="pinhole,car,2019"
2019/12/ keywordlist="pinhole,car,2019"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"

一个sed备选方案:

$ sed -En 's|^(.*)/[^/]+:.*:([^:]+)$|/ |p' "${file}"

其中:

  • -En - 启用对扩展正则表达式的支持,禁止自动打印输入行
  • 由于数据包含 / 字符,我们将使用 | 作为 sed 脚本分隔符
  • ^(.*)/ - [第一个捕获组] 匹配直到最后一个 / 之前的所有内容...
  • [^/]+: - 匹配第一个 : 之前不是 / 的所有内容,然后 ...
  • .*: - 匹配下一个 :
  • ([^:]+)$ - [第二个捕获组] 最后匹配行尾不是 :
  • 的所有内容
  • / - 打印第一个捕获组 + / + 第二个捕获组

这会生成:

2019/12/ keywordlist="pinhole,car,2019"
2019/12/ keywordlist="pinhole,car,2019"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"

首先,awk 命令的最后一个参数应该是一个文件名。您正在向它传递一个包含输入文件一行内容的变量。这就是您收到 awk: fatal: cannot open file 错误的原因。

其次,您在 cut 命令中犯了同样的错误,导致 : No such file or directory 错误。

awkcut 都是为处理完整的文件而设计的。您可以使用管道字符将它们链接在一起,以便一个输出成为另一个输入:|。例如:

cat ${file} | awk ... | cut ...

但这很快就会变得复杂和笨拙。更好的解决方案是使用 Stream Editor sedsed 将逐行读取它的输入,并且可以在逐行输出结果之前对每一行执行相当复杂的操作。

这应该可以满足您的要求:

#!/bin/bash

file="/tz.txt"

sed -En 's/^([0-9]{4}\/[0-9]{2}\/).*bopt:(.*)$/ /p' ${file}

引用表达式的解释如下:

s/pat/rep/p 搜索 pat,如果找到,替换为 rep 并打印结果。

在我们的例子中,pat 是:

^ 行首

( 开始回忆下面的内容

[0-9]{4} 任何数字正好重复 4 次

\/ / 字符(转义)

[0-9]{2}\/ 任何数字恰好重复 2 次,然后是 /

) 不记得了

.*bopt: 任意 0 个或多个字符后跟 bopt:

(.*) 记住0个或更多字符...

$ ...直到行尾。

rep是:

记住的第一件事,然后是 space,接着是我们记住的第二件事。