如何提取两个单词之间的（第一次匹配）文本

Question

我有一个具有以下结构的文件

destination list

move from station d-435-435 to point place1
move from station d-435-435 to point place2
move from mainpoint

我想在单词“from station”和“to point”之间提取单词“d-435-435”（仅第一个匹配项，这不必总是相同的值） “

我怎样才能做到这一点？

到目前为止我尝试了什么？

id=$(sed 's/.*from station $.*$ to.*//' input.txt)

但是这个returns下面的值：destination list d-435-435 move from mainpoint

Answer 1

第一个解决方案： 使用您显示的示例，请尝试遵循 GNU awk 代码。在此处使用 awk 程序的 match 函数匹配正则表达式 rom station\s+\S+\s+to point 以获取 OP 请求的值，然后从匹配值中删除 from station\s+ 和 \s+to point 并打印所需值。

awk '
match([=10=],/from station\s+\S+\s+to point/){
  val=substr([=10=],RSTART,RLENGTH)
  gsub(/from station\s+|\s+to point/,"",val)
  print val
  exit
}
' Input_file

第二个解决方案： 使用 GNU grep 请尝试以下。使用 -oP 选项打印匹配部分并在此处分别启用 PCRE 正则表达式。然后在主 grep 程序中匹配字符串 from station 后跟 space(s) 然后使用 \K 选项将确保在 \K 之前匹配的部分被遗忘（因为 e在输出中不需要这个），然后匹配 \S+（非 space 值）后跟 space(s) to point 字符串（在这里使用正向展望来确保它只检查它是否存在但不打印它）。

grep -oP -m1 'from station\s+\K\S+(?=\s+to point)' Input_file

Answer 2

用awk你可以写前后条件字段 $4，其中 d-435-435 是，然后打印此字段 only the first match 并在 print 语句后以 exit 退出：

awk '=="from" && =="station" && =="to" && =="point" {print ; exit}' file
d-435-435

或使用 GNU awk 作为第三个参数 match():

awk 'match([=11=],/from station\s+(.*)\s+to point/,a){print a[1];exit}' file
d-435-435

正则表达式包含括号，因此数组 a[1] 的整数索引元素包含 from station 之间的字符串部分，后跟 space(s) \s+和 space(s) \s+ 后跟 to point.

Answer 3

如果 GNU sed 可用，怎么样：

id=$(sed -nE '0,/from station.*to/ s/.*from station (.*) to.*//p' input.txt)

-n 选项禁止打印，除非替换成功。
条件0,/pattern/是触发器运算符，它returns为假模式匹配成功后。 0 地址是一个 GNU sed 扩展，它使第一行与模式匹配。

Answer 4

这可能适合您 (GNU sed)：

sed -nE '/.*station (\S+) to point.*/{s///;H;x;/\n(\S+)\n.*/{s/\n\S+$//;x;d};x;p}' file

关闭隐式打印并打开扩展的正则表达式命令行选项-nE。

如果一行符合要求的条件，则提取所需的字符串，将副本附加到保留 space，检查是否已经看到匹配项，如果没有则打印它。如果已看到匹配项，请将其从保留中删除 space.

否则，不打印任何东西。

Answer 5

这应该适用于任何 sed:

sed -e '/.*from station \([^ ]*\) to .*/!d' -e 's///' -e q file

如何提取两个单词之间的（第一次匹配）文本

How to extract (First match)text between two words

awk

grep

sed