Select 使用 "Before" 参数时每组的最后一场比赛
Select the last match per group while using "Before" parameter
我有一个看起来像这样的文本文件...
<title> south asia </title>
India is country that is part of south asia.
<title> africa </title>
kenya is a country that is part of africa.
此命令按预期工作,returns 正确的类别...
grep -B1 'kenya' wiki.txt | grep title
但是如果文本文件看起来像这样,这个技巧就不起作用了...
<title> south asia </title>
India is country that is part of south asia.
<title> africa </title>
List of countries:
kenya is a country that is part of africa.
如果我不知道“之前”参数的正确值,那么我会得到额外的(错误的)标题。
# grep -B5 'kenya' wiki.txt | grep title
<title> south asia </title>
<title> africa </title>
在使用 -B 参数时是否可以 select 每个组的最后一个“标题”?
预计:
title africa title 行应该返回 "kenya" 这个词,即使我不知道文章中使用的行数。
这是一个 tac
+ awk
解决方案,使用您展示的尝试和示例编写和测试。
tac Input_file |
awk -F'>[[:space:]]+|[[:space:]]*<' '
/kenya/{
found=1
next
}
found && /\<title\>/{
print
found=""
}
'
说明:为以上代码添加详细说明。
tac Input_file | ##Using tac to print file from bottom to top and sending its output as input to awk program.
awk -F'>[[:space:]]+|[[:space:]]*<' ' ##Starting awk program setting field separator to >[[:space:]]+ OR [[:space:]]*< here.
/kenya/{ ##Checking condition if word kenya is found in line.
found=1 ##Setting found to 1 here.
next ##next will skip all further statements from here.
}
found && /\<title\>/{ ##Checking if found is SET and it contains title.
print ##Printing current line.
found="" ##Nullifying found here.
}
'
可以取最后一行输出:
grep -B5 'kenya' wiki.txt | grep title | tail -n 1
使用 awk
只需将 title
记录存储在变量 (t
) 中,并在遇到匹配词时打印它们 (变量 w
):
$ awk -vw='kenya' '/<title>/ {t=[=10=]} [=10=]~w {print t}' wiki.txt
<title> africa </title>
我有一个看起来像这样的文本文件...
<title> south asia </title>
India is country that is part of south asia.
<title> africa </title>
kenya is a country that is part of africa.
此命令按预期工作,returns 正确的类别...
grep -B1 'kenya' wiki.txt | grep title
但是如果文本文件看起来像这样,这个技巧就不起作用了...
<title> south asia </title>
India is country that is part of south asia.
<title> africa </title>
List of countries:
kenya is a country that is part of africa.
如果我不知道“之前”参数的正确值,那么我会得到额外的(错误的)标题。
# grep -B5 'kenya' wiki.txt | grep title
<title> south asia </title>
<title> africa </title>
在使用 -B 参数时是否可以 select 每个组的最后一个“标题”?
预计: title africa title 行应该返回 "kenya" 这个词,即使我不知道文章中使用的行数。
这是一个 tac
+ awk
解决方案,使用您展示的尝试和示例编写和测试。
tac Input_file |
awk -F'>[[:space:]]+|[[:space:]]*<' '
/kenya/{
found=1
next
}
found && /\<title\>/{
print
found=""
}
'
说明:为以上代码添加详细说明。
tac Input_file | ##Using tac to print file from bottom to top and sending its output as input to awk program.
awk -F'>[[:space:]]+|[[:space:]]*<' ' ##Starting awk program setting field separator to >[[:space:]]+ OR [[:space:]]*< here.
/kenya/{ ##Checking condition if word kenya is found in line.
found=1 ##Setting found to 1 here.
next ##next will skip all further statements from here.
}
found && /\<title\>/{ ##Checking if found is SET and it contains title.
print ##Printing current line.
found="" ##Nullifying found here.
}
'
可以取最后一行输出:
grep -B5 'kenya' wiki.txt | grep title | tail -n 1
使用 awk
只需将 title
记录存储在变量 (t
) 中,并在遇到匹配词时打印它们 (变量 w
):
$ awk -vw='kenya' '/<title>/ {t=[=10=]} [=10=]~w {print t}' wiki.txt
<title> africa </title>