从 xml 文件中提取字段

Question

xml 文件:

<head>
  <head2>
    <dict type="abc" file="/path/to/file1"></dict>
    <dict type="xyz" file="/path/to/file2"></dict>
  </head2>
</head>

我需要从中提取文件列表。所以输出将是

/path/to/file1
/path/to/file2

到目前为止，我已经做到了以下几点。

grep "<dict*file=" /path/to/xml.file | awk '{print }' | awk -F= '{print $NF}'

Answer 1

根据您的示例快速而肮脏，而不是 xml 可能性

# sed a bit secure
sed -e '/<head>/,/<\/head>/!d' -e '/.*[[:blank:]]file="\([^"]*\)".*/!d' -e 's///' YourFile

# sed in brute force
sed -n 's/.*[[:blank:]]file="\([^"]*\)".*//p' -e 's///' YourFile



# awk quick unsecure using your sample
awk -F 'file="|">' '/<head>/{h=1} /\/head>{h=0} h && /[[:blank:]]file/ { print  }' YourFile

现在，我不会在 XML 上推广这种提取，除非你真的知道你的来源在格式和内容方面如何（额外字段、转义引号、字符串内容如标签格式、.. .) 是失败和意外结果的重要原因，没有更合适的工具可用

现在使用您自己的脚本

#grep "<dict*file=" /path/to/xml.file | awk '{print }' | awk -F= '{print $NF}'
awk '! /<dict.*file=/ {next} {[=11=]=;FS="\"";[=11=]=[=11=];print ;FS=OFS}' YourFile

不需要 grep 和 awk，使用起始模式过滤器 /<dict.*file/
使用不同分隔符 (FS) 的第二个 awk 可以在更改 FS 的同一脚本中完成，但因为它只发生在下一次评估时（默认情况下为下一行），您可以使用 $0= 强制重新评估当前内容在这种情况下 $0

Answer 2

使用 xmllint 解决方案 -xpath 作为 //head/head2/dict/@file

xmllint --xpath "//head/head2/dict/@file" input-xml | awk 'BEGIN{FS="file="}{printf "%s\n%s\n", gensub(/"/,"","g",), gensub(/"/,"","g",)}'
/path/to/file1
/path/to/file2

遗憾的是无法提供纯粹的xmllint逻辑，因为思想应用，

xmllint --xpath "string(//head/head2/dict/@file)" input-xml

将 return 来自两个节点的 file 属性，但它只是 return 第一个实例。

所以添加了我的逻辑与 GNU Awk，以提取所需的值，做

xmllint --xpath "//head/head2/dict/@file" input-xml

returns 值为

file="/path/to/file1" file="/path/to/file2"

在上面的输出中，将字符串 de-limiter 设置为 file= 并使用 gensub() 函数删除 double-quotes 解决了要求。

Answer 3

还有PE [perl e无处不在:)]解决方案：

perl -MXML::LibXML -E 'say $_->to_literal for XML::LibXML->load_xml(location=>q{file.xml})->findnodes(q{/head/head2/dict/@file})'

它打印

/path/to/file1
/path/to/file2

对于以上内容，您需要安装 XML::LibXML 模块。

Answer 4

对于 xmlstarlet 它将是：

xmlstarlet sel -t -v "//head/head2/dict/@file" -nl input.xml

Answer 5

这个命令：

awk -F'[=" ">]' '{print }' file

将产生：

/path/to/file1
/path/to/file2

从 xml 文件中提取字段

Extract field from xml file

bash

awk

grep

sed

xmllint