bash 操作文件中的多个字符串
bash manipulate multiple strings in a file
我想从文件名中删除以下输入行,我正在使用这个文件:
cat <<EOF >./tz.txt
2019/12/_MG_0263.CR2.xmp: bopt:keywordlist="pinhole,car,2019"
2019/12/_MG_0262.CR2.xmp: bopt:keywordlist="pinhole,car,2019"
2020/06/ok/_MG_0003.CR2.xmp: bopt:keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/_MG_0002.CR2.xmp: bopt:keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/_MG_0137.CR2.xmp: bopt:keywordlist="red,car,2020"
2020/04/_MG_0136.CR2.xmp: bopt:keywordlist="red,car,2020"
2020/04/_MG_0136.CR2.xmp: bopt:keywordlist="red,car,2020"
EOF
现在我正在使用下面的脚本(存储在文件 ab.sh 中)来排除 [filename.xmp: bopt:](例如 _MG_0263.CR2.xmp : bopt :) 从每一行开始,这样输出看起来像这样:
2019/12/ keywordlist="pinhole,car,2019"
2019/12/ keywordlist="pinhole,car,2019"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
以上是完整的预期输出。某些文件夹可能具有不同的结构,例如 2020/06/ok/
脚本代码如下:
#!/bin/bash
file="./tz.txt"
while read line ; do
# variable a generates the folter structure with a variable range of considered columns
# using awk to figure out how many columns (aka folders) there are in the structure
a=$( cut -d"/" -f 1-$( awk -F'/' '{ print NF-1 }' $line ) $line )
# | |
# -this bit should create a number for-
# -the cut command -
# then b variable stores the last bit in the string
b=$( cut -d":" -f 3 $line )
# and below combine results from above variables
echo ${a} ${b}
done < ${file}
所附图片说明了用于将字符串拆分成列并仅获取相关数据的逻辑。
问题是我收到以下错误,但我不确定哪里出错了。
感谢您的任何建议或帮助
$ sh ~/ab.sh
awk: fatal: cannot open file `2019/12/_MG_0263.CR2.xmp:' for
reading (No such file or directory)
cut: '2019/12/_MG_0263.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
cut: '2019/12/_MG_0263.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
awk: fatal: cannot open file `2019/12/_MG_0262.CR2.xmp:' for reading (No such file or directory)
cut: '2019/12/_MG_0262.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
cut: '2019/12/_MG_0262.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
awk: fatal: cannot open file `2020/06/ok/_MG_0003.CR2.xmp:' for reading (No such file or directory)
cut: '2020/06/ok/_MG_0003.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="lowkey,car,Chiaroscuro,2020"': No such file or directory
cut: '2020/06/ok/_MG_0003.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="lowkey,car,Chiaroscuro,2020"': No such file or directory
....
一个 awk
替换 while
循环的想法:
awk -F':' '
{ gsub(/[^/]+$/,"",) # strip everything after last "/" from 1st field
print ,
}' "${file}"
# or as a one-liner sans comments:
awk -F':' '{gsub(/[^/]+$/,"",); print , }' "${file}"
这会生成:
2019/12/ keywordlist="pinhole,car,2019"
2019/12/ keywordlist="pinhole,car,2019"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
一个sed
备选方案:
$ sed -En 's|^(.*)/[^/]+:.*:([^:]+)$|/ |p' "${file}"
其中:
-En
- 启用对扩展正则表达式的支持,禁止自动打印输入行
- 由于数据包含
/
字符,我们将使用 |
作为 sed
脚本分隔符
^(.*)/
- [第一个捕获组] 匹配直到最后一个 /
之前的所有内容...
[^/]+:
- 匹配第一个 :
之前不是 /
的所有内容,然后 ...
.*:
- 匹配下一个 :
([^:]+)$
- [第二个捕获组] 最后匹配行尾不是 :
的所有内容
/
- 打印第一个捕获组 + /
+ 第二个捕获组
这会生成:
2019/12/ keywordlist="pinhole,car,2019"
2019/12/ keywordlist="pinhole,car,2019"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
首先,awk
命令的最后一个参数应该是一个文件名。您正在向它传递一个包含输入文件一行内容的变量。这就是您收到 awk: fatal: cannot open file
错误的原因。
其次,您在 cut
命令中犯了同样的错误,导致 : No such file or directory
错误。
awk
和 cut
都是为处理完整的文件而设计的。您可以使用管道字符将它们链接在一起,以便一个输出成为另一个输入:|
。例如:
cat ${file} | awk ... | cut ...
但这很快就会变得复杂和笨拙。更好的解决方案是使用 Stream Editor sed
。 sed
将逐行读取它的输入,并且可以在逐行输出结果之前对每一行执行相当复杂的操作。
这应该可以满足您的要求:
#!/bin/bash
file="/tz.txt"
sed -En 's/^([0-9]{4}\/[0-9]{2}\/).*bopt:(.*)$/ /p' ${file}
引用表达式的解释如下:
s/pat/rep/p
搜索 pat
,如果找到,替换为 rep
并打印结果。
在我们的例子中,pat
是:
^
行首
(
开始回忆下面的内容
[0-9]{4}
任何数字正好重复 4 次
\/
/
字符(转义)
[0-9]{2}\/
任何数字恰好重复 2 次,然后是 /
)
不记得了
.*bopt:
任意 0 个或多个字符后跟 bopt:
(.*)
记住0个或更多字符...
$
...直到行尾。
而rep
是:
记住的第一件事,然后是 space,接着是我们记住的第二件事。
我想从文件名中删除以下输入行,我正在使用这个文件:
cat <<EOF >./tz.txt
2019/12/_MG_0263.CR2.xmp: bopt:keywordlist="pinhole,car,2019"
2019/12/_MG_0262.CR2.xmp: bopt:keywordlist="pinhole,car,2019"
2020/06/ok/_MG_0003.CR2.xmp: bopt:keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/_MG_0002.CR2.xmp: bopt:keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/_MG_0137.CR2.xmp: bopt:keywordlist="red,car,2020"
2020/04/_MG_0136.CR2.xmp: bopt:keywordlist="red,car,2020"
2020/04/_MG_0136.CR2.xmp: bopt:keywordlist="red,car,2020"
EOF
现在我正在使用下面的脚本(存储在文件 ab.sh 中)来排除 [filename.xmp: bopt:](例如 _MG_0263.CR2.xmp : bopt :) 从每一行开始,这样输出看起来像这样:
2019/12/ keywordlist="pinhole,car,2019"
2019/12/ keywordlist="pinhole,car,2019"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
以上是完整的预期输出。某些文件夹可能具有不同的结构,例如 2020/06/ok/
脚本代码如下:
#!/bin/bash
file="./tz.txt"
while read line ; do
# variable a generates the folter structure with a variable range of considered columns
# using awk to figure out how many columns (aka folders) there are in the structure
a=$( cut -d"/" -f 1-$( awk -F'/' '{ print NF-1 }' $line ) $line )
# | |
# -this bit should create a number for-
# -the cut command -
# then b variable stores the last bit in the string
b=$( cut -d":" -f 3 $line )
# and below combine results from above variables
echo ${a} ${b}
done < ${file}
所附图片说明了用于将字符串拆分成列并仅获取相关数据的逻辑。
问题是我收到以下错误,但我不确定哪里出错了。 感谢您的任何建议或帮助
$ sh ~/ab.sh
awk: fatal: cannot open file `2019/12/_MG_0263.CR2.xmp:' for
reading (No such file or directory)
cut: '2019/12/_MG_0263.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
cut: '2019/12/_MG_0263.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
awk: fatal: cannot open file `2019/12/_MG_0262.CR2.xmp:' for reading (No such file or directory)
cut: '2019/12/_MG_0262.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
cut: '2019/12/_MG_0262.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="pinhole,car,2019"': No such file or directory
awk: fatal: cannot open file `2020/06/ok/_MG_0003.CR2.xmp:' for reading (No such file or directory)
cut: '2020/06/ok/_MG_0003.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="lowkey,car,Chiaroscuro,2020"': No such file or directory
cut: '2020/06/ok/_MG_0003.CR2.xmp:': No such file or directory
cut: 'bopt:keywordlist="lowkey,car,Chiaroscuro,2020"': No such file or directory
....
一个 awk
替换 while
循环的想法:
awk -F':' '
{ gsub(/[^/]+$/,"",) # strip everything after last "/" from 1st field
print ,
}' "${file}"
# or as a one-liner sans comments:
awk -F':' '{gsub(/[^/]+$/,"",); print , }' "${file}"
这会生成:
2019/12/ keywordlist="pinhole,car,2019"
2019/12/ keywordlist="pinhole,car,2019"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
一个sed
备选方案:
$ sed -En 's|^(.*)/[^/]+:.*:([^:]+)$|/ |p' "${file}"
其中:
-En
- 启用对扩展正则表达式的支持,禁止自动打印输入行- 由于数据包含
/
字符,我们将使用|
作为sed
脚本分隔符 ^(.*)/
- [第一个捕获组] 匹配直到最后一个/
之前的所有内容...[^/]+:
- 匹配第一个:
之前不是/
的所有内容,然后 ....*:
- 匹配下一个:
([^:]+)$
- [第二个捕获组] 最后匹配行尾不是:
的所有内容
/
- 打印第一个捕获组 +/
+ 第二个捕获组
这会生成:
2019/12/ keywordlist="pinhole,car,2019"
2019/12/ keywordlist="pinhole,car,2019"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/06/ok/ keywordlist="lowkey,car,Chiaroscuro,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
2020/04/ keywordlist="red,car,2020"
首先,awk
命令的最后一个参数应该是一个文件名。您正在向它传递一个包含输入文件一行内容的变量。这就是您收到 awk: fatal: cannot open file
错误的原因。
其次,您在 cut
命令中犯了同样的错误,导致 : No such file or directory
错误。
awk
和 cut
都是为处理完整的文件而设计的。您可以使用管道字符将它们链接在一起,以便一个输出成为另一个输入:|
。例如:
cat ${file} | awk ... | cut ...
但这很快就会变得复杂和笨拙。更好的解决方案是使用 Stream Editor sed
。 sed
将逐行读取它的输入,并且可以在逐行输出结果之前对每一行执行相当复杂的操作。
这应该可以满足您的要求:
#!/bin/bash
file="/tz.txt"
sed -En 's/^([0-9]{4}\/[0-9]{2}\/).*bopt:(.*)$/ /p' ${file}
引用表达式的解释如下:
s/pat/rep/p
搜索 pat
,如果找到,替换为 rep
并打印结果。
在我们的例子中,pat
是:
^
行首
(
开始回忆下面的内容
[0-9]{4}
任何数字正好重复 4 次
\/
/
字符(转义)
[0-9]{2}\/
任何数字恰好重复 2 次,然后是 /
)
不记得了
.*bopt:
任意 0 个或多个字符后跟 bopt:
(.*)
记住0个或更多字符...
$
...直到行尾。
而rep
是:
记住的第一件事,然后是 space,接着是我们记住的第二件事。