如何从 bash shell 文件中的两个不同行中提取两个不同的关键字？

Question

我有一个名为 data.txt 的文件。当我阅读文件时，内容如下所示。

$ cat data.txt 
name: linuxVol
id: 6
type: Linux
dir excludes: .snapshot*
              ~snapshot*
              .zfs
              .isilon
              .ifsvar
              .lustre
inode: 915720
free_space: 35.6TiB (auto)
total_capacity: 95.0TiB (auto)
number_of_files: 5,789,643
number_of_dirs: 520,710
mounts: https://server1.example.com:30002: /mnt/tube

如何从第三行提取关键字Linux，从最后一行提取关键字server1.example.com，然后在同一行中以space 分隔表示它们？输出应如下所示

Linux server1.example.com

我试图做这样的事情，但不确定如何提取 server1.example.com

cat data.txt | egrep "type|mounts" | awk '{print $NF}' | tr "\n" " "

输出为：

Linux /mnt/tube

我的预期输出：

Linux server1.example.com

用 AWK/SED 解决它对我有用。

谢谢！

Answer 1

如果文件中只有两行，一行带有 type:，另一行带有 mount:，并且它们按固定顺序出现，您可以使用

awk '/type:|mounts:/{gsub(/https?:\/\/|:.*/, "", ); a = (length(a)==0 ? "" : a " ") } END{print a}' file

如果一行包含type:或mounts:，则http://或https://以及:之后的所有文本从字段2中删除，然后该值要么分配给 a，要么附加 space 到 a，一旦文件结束，就会打印 a 值。

详情:

/type:|mounts:/ - 查找包含 type: 或 mounts:
gsub(/https?:\/\/|:.*/, "", ) - 从字段 2 值

http://

https://

:

a = (length(a)==0 ? "" : a " ") - 将a + space + Field 2 的值赋值给a 如果a 不为空，如果是，则直接赋值Field 2 值 a
END{print a} - 在文件处理结束时，打印 a 值。

参见 online demo:

#!/bin/bash
s='name: linuxVol
id: 6
type: Linux
dir excludes: .snapshot*
              ~snapshot*
              .zfs
              .isilon
              .ifsvar
              .lustre
inode: 915720
free_space: 35.6TiB (auto)
total_capacity: 95.0TiB (auto)
number_of_files: 5,789,643
number_of_dirs: 520,710
mounts: h''ttps://server1.example.com:30002: /mnt/tube'

awk '/type:|mounts:/{gsub(/https?:\/\/|:.*/, "", ); a = (length(a)==0 ? "" : a " ") } END{print a}' <<< "$s"

输出：

Linux server1.example.com

Answer 2

每当你的输入中有 tag-value 对时，我发现最好创建一个数组（下面的 f[]）来保存这些映射，然后你可以通过使用索引该数组来访问这些值标签：

$ cat tst.awk
{
    tag = val = [=10=]
    sub(/:.*/,"",tag)
    sub(/[^:]+: */,"",val)
    f[tag] = val
}
END {
    sub("[^:]+://","",f["mounts"])
    sub(/:.*/,"",f["mounts"])
    print f["type"], f["mounts"]
}

$ awk -f tst.awk data.txt
Linux server1.example.com

如果您还想处理 dir excludes 标签的 multi-line 值，则以上内容需要稍作调整，但正如所写，它使您能够 read/test/print 所有其他值通过他们的标签。

Answer 3

使用 gnu-awk 您还可以将字段分隔符设置为 : 后跟 1 个或多个空格。

然后检查第一个字段的类型或挂载，对于挂载使用捕获组获取 https:// 部分之后的部分

鉴于行的顺序相同，并且两个关键字都存在，您可以连接这些值。

awk -F ":[[:space:]]+" '
 == "type" {s = }
 == "mounts" && match(, /https?:\/\/([^[:space:]:]+)/, a) {s = s " " a[1]}
END {print s}
' data.txt

输出

Linux server1.example.com

Answer 4

我认为还需要 sed 解决方案

$ sed -n '/type:/{s/[^ ]* \(.*\)//;h;d};/mounts:/{s|[^/]*//\([^:]*\).*||;x;G;s/\n/ /p}' input_file
Linux server1.example.com

或作为脚本

$ cat script.sed
/type:/ {                     #Match line with `type:`     
    s/[^ ]* \(.*\)//        #Extract the word `Linux`
    h                         #Retain it in the hold sapce
    d                         #Delete the line
}
/mounts:/ {                   #Match line with `mounts:`
    s|[^/]*//\([^:]*\).*||  #Extract the URL
    x                         #Exchange the contents of the hold space for the URL
    G                         #Append the URL
    s/\n/ /p                  #Remove the new line
}

$ sed -nf script.sed input_file
Linux server1.example.com

如何从 bash shell 文件中的两个不同行中提取两个不同的关键字？

How do I extract two different keywords from two different lines in a file in bash shell?

bash

awk

sed