如何从 XML 文件中提取与其他字符串共享相同标签的特定字符串?

How to extract specific strings from a XML file that share the same tag as other strings?

在下面的 XML 片段中...

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:s="http://dev.splunk.com/ns/rest" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
  <title>serverclients</title>
  <id>https://splfwdprw2:8089/servicesNS/nobody/search/deployment/server/clients</id>
  <updated>2017-04-04T16:14:04-04:00</updated>
  <generator build="f3e41e4b37b2" version="6.3.1"/>
  <author>
    <name>Splunk</name>
  </author>
  <link href="/servicesNS/nobody/search/deployment/server/clients/_acl" rel="_acl"/>
  <link href="/servicesNS/nobody/search/deployment/server/clients/countClients_by_machineType" rel="countClients_by_machineType"/>
  <link href="/servicesNS/nobody/search/deployment/server/clients/countRecentDownloads" rel="countRecentDownloads"/>
  <link href="/servicesNS/nobody/search/deployment/server/clients/getMatchingAppsForClient_dryRun" rel="getMatchingAppsForClient_dryRun"/>
  <link href="/servicesNS/nobody/search/deployment/server/clients/preview" rel="preview"/>
  <opensearch:totalResults>1</opensearch:totalResults>
  <opensearch:itemsPerPage>18446744073709551615</opensearch:itemsPerPage>
  <opensearch:startIndex>0</opensearch:startIndex>
  <s:messages/>
  <entry>
    <title>00031e8f6c883544b8079037c5eba9ec</title>
    <id>https://splfwdprw2:8089/servicesNS/nobody/search/deployment/server/clients/00031e8f6c883544b8079037c5eba9ec</id>
    <updated>2017-04-04T16:14:04-04:00</updated>
    <link href="/servicesNS/nobody/search/deployment/server/clients/00031e8f6c883544b8079037c5eba9ec" rel="alternate"/>
    <author>
      <name>system</name>
    </author>
    <link href="/servicesNS/nobody/search/deployment/server/clients/00031e8f6c883544b8079037c5eba9ec" rel="list"/>
    <link href="/servicesNS/nobody/search/deployment/server/clients/00031e8f6c883544b8079037c5eba9ec" rel="remove"/>
    <content type="text/xml">
      <s:dict>
        <s:key name="applications">
          <s:dict>
            <s:key name="all_deploymentclient">
              <s:dict>
                <s:key name="action">Phonehome</s:key>
                <s:key name="archive">/opt/splunk/var/run/tmp/all_deploymentclient/all_deploymentclient-1491320471.bundle</s:key>
                <s:key name="checksum">0</s:key>
                <s:key name="excludeFromUpdate"></s:key>
                <s:key name="failedReason"></s:key>
                <s:key name="issueReload">0</s:key>
                <s:key name="restartSplunkWeb">0</s:key>
                <s:key name="restartSplunkd">1</s:key>
                <s:key name="result">Ok</s:key>
                <s:key name="serverclasses">
                  <s:list>
                    <s:item>all_deploymentclient</s:item>
                  </s:list>
                </s:key>
                <s:key name="size">10240</s:key>
                <s:key name="stateOnClient">enabled</s:key>
                <s:key name="timestamp">Tue Apr  4 11:42:54 2017</s:key>
              </s:dict>
            </s:key>
            <s:key name="all_fwd_outputs_18indexers">
              <s:dict>
                <s:key name="action">Phonehome</s:key>
                <s:key name="archive">/opt/splunk/var/run/tmp/all_fwd/all_fwd_outputs_18indexers-1491320471.bundle</s:key>
                <s:key name="checksum">0</s:key>
                <s:key name="excludeFromUpdate"></s:key>
                <s:key name="failedReason"></s:key>
                <s:key name="issueReload">0</s:key>
                <s:key name="restartSplunkWeb">0</s:key>
                <s:key name="restartSplunkd">1</s:key>
                <s:key name="result">Ok</s:key>
                <s:key name="serverclasses">
                  <s:list>
                    <s:item>all_fwd</s:item>
                  </s:list>
                </s:key>
                <s:key name="size">10240</s:key>
                <s:key name="stateOnClient">enabled</s:key>
                <s:key name="timestamp">Tue Apr  4 11:42:54 2017</s:key>
              </s:dict>
            </s:key>

...我正在尝试提取出现在 "s:key name="applications" 标签下方第一级中的任何 "s:key name=" 字符串。在本例中,我要提取的字符串是 "all_deploymentclient" 和 "all_fwd_outputs_18indexers"。如果其他字符串出现在同一级别,我也想捕获它们。

我正在使用 xml_grep,但我不确定如何定义参数以获得所需的结果(因为 "s:key name=" 标签有多个实例其中一些用作标题类型,而另一些则具有分配给它们的值)。

所以,当一切都说完之后,这个例子的提取输出应该是:

all_deploymentclient
all_fwd_outputs_18indexers

我怎样才能做到这一点?是否需要其他实用程序(例如 xpath)?

也许先试试这个(替换 s: 命名空间前缀的快速而肮脏的方法):

cat /var/tmp/content.xml | sed 's/s://g' > cat /var/tmp/content2.xml 

那就试试

xmllint  --xpath "//key[@name='all_deploymentclient' or @name='all_fwd_outputs_18indexers']/@name" /var/tmp/content2.xml \
| sed "s| name|\nname|g; s/name=\"//; s/\"$//"

当您对任何类型的数据源进行任何类型的搜索时,仅仅知道实际数据是什么是不够的(如果您知道,则不需要搜索它):您需要知道在什么地方它可能与显示的示例有所不同。

因此我们必须查看您对问题的描述:"s:key name=" 字符串出现在 "s:key name="applications" 标记之后的缩进中,并尝试理解您的意思。

  • 缩进,您的字面意思是分页布局,还是您谈论 XML 数据模型的树结构的方式?

  • 当您说 "after" 时,我们是否将此(根据您的示例)解释为意思是 "first descendant" 个元素,即我们遇到的第一个匹配后代在树上散步?

我们可以假设这些 "first descendants" 总是比原始节点(即孙子节点)低两层吗?如果是这样,XPath 解决方案就是

//s:key[@name="applications"]/*/*/@name

但是如果 "first descendants" 可能处于不同的深度,那么它就会变得更加困难,并且解决方案也可能取决于您使用的 XPath 版本。所以我们需要更多信息。

我不知道 xml_grep 有什么能力。

在考虑了@MichaelKay 和@knb 提供的信息后,我确定了一个解决方案。我最终使用 xmlstarlet 来获取我需要的信息如下:

xmlstarlet sel -T -t -m "//*[local-name()='key'][@name='applications']/*/*/@name" -v . -n <XML filename>

这产生了以下输出:

all_deploymentclient
all_fwd_outputs_18indexers

感谢大家的贡献!