使用 shell 命令将特定 XML 文档元素复制到另一个 XML 文档

Question

我正在通过 shell 脚本在 AWS EMR 上安装 Kylin。我有一个包含以下内容的 xml 文件，我需要从中将特定文档元素复制到另一个 xml 文件。这是我需要使用 shell 命令自动执行的手动步骤，同时运行安装 shell 脚本。

/etc/hbase/conf/hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>ip-nn-nn-nn-nn.ec2.internal</value>
  </property>

  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://ip-nn-nn-nn-nn.ec2.internal:xxxx/user/hbase</value>
  </property>

  <property>
    <name>dfs.support.append</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.rest.port</name>
    <value>xxxx</value>
  </property>
</configuration>

我需要将 hbase.zookeeper.quorum 属性从 /etc/hbase/conf/hbase-site.xml 复制到 $KYLIN_HOME/conf/kylin_job_conf.xml，像这样：

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>ip-nn-nn-nn-nn.ec2.internal</value>
</property>

注意：$KYLIN_HOME/conf/kylin_job_conf.xml里面已经包含了一些其他数据。

需要将输出复制到目标文件。

目标文件“$KYLIN_HOME/conf/kylin_job_conf.xml”如下所示：

<configuration>

    <property>
        <name>mapreduce.job.split.metainfo.maxsize</name>
        <value>-1</value>
        <description>The maximum permissible size of the split metainfo file.
            The JobTracker won't attempt to read split metainfo files bigger than
            the configured value. No limits if set to -1.
        </description>
    </property>

    <property>
        <name>mapreduce.map.output.compress</name>
        <value>true</value>
        <description>Compress map outputs</description>
    </property>

    <property>
        <name>mapreduce.output.fileoutputformat.compress</name>
        <value>true</value>
        <description>Compress the output of a MapReduce job</description>
    </property>

    <property>
        <name>mapreduce.output.fileoutputformat.compress.codec</name>
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
        <description>The compression codec to use for job outputs
        </description>
    </property>

    <property>
        <name>mapreduce.output.fileoutputformat.compress.type</name>
        <value>BLOCK</value>
        <description>The compression type to use for job outputs</description>
    </property>

    <property>
        <name>mapreduce.job.max.split.locations</name>
        <value>xxxx</value>
        <description>No description</description>
    </property>

    <property>
        <name>dfs.replication</name>
        <value>xxx</value>
        <description>Block replication</description>
    </property>

    <property>
        <name>mapreduce.task.timeout</name>
        <value>xxxx</value>
        <description>Set task timeout to 1 hour</description>
    </property>

</configuration>

预期输出：

<configuration>

    <property>
        <name>mapreduce.job.split.metainfo.maxsize</name>
        <value>-1</value>
        <description>The maximum permissible size of the split metainfo file.
            The JobTracker won't attempt to read split metainfo files bigger than
            the configured value. No limits if set to -1.
        </description>
    </property>

    <property>
        ---------
        ---------
        ---------
    </property>

    <property>
        ---------
        ---------
        ---------
    </property>

    <property>
        ---------
        ---------
        ---------
    </property>

    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>ip-nn-nn-nn-nn.ec2.internal</value>
    </property>

</configuration>

是否有任何 shell 命令可以从上述 xml 文件中获取特定的文档元素并自动将其复制到另一个 xml 文件中。

我试过以下命令：

awk 'NR == FNR { if(FNR >= 30 && FNR <= 33) { patch = patch [=17=] ORS }; next } FNR == 88 { [=17=] = patch [=17=] } 1' /etc/hbase/conf/hbase-site.xml $KYLIN_HOME/conf/kylin_job_conf.xml > $KYLIN_HOME/conf/kylin_job_conf.xml

上面的命令对 me.can 不起作用，有人帮我解决这个问题吗？

Answer 1

尝试使用 RegEx 查询 XML 文件很少是个好主意。
总是喜欢使用 XML 解析器！

因此您可以使用 xmlstarlet 完成给定的任务。这是一个单一的程序，可以从您的输入中通过一个命令获取您想要的数据 ("input.xml"):

xmlstarlet sel -t -c "/configuration/property[name='hbase.zookeeper.quorum']" input.xml

它的输出是：

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>ip-nn-nn-nn-nn.ec2.internal</value>
</property>

如果您的计算机上没有安装 xmlstarlet，请执行

sudo apt-get -y install xmlstarlet

命令行选项是：

sel：Select 数据或查询 XML 文档（XPATH 等）
-t : 模板模式：为模板解释以下命令
-c : 打印以下 XPATH 表达式的副本

现在，在第二步中，将结果 XML 复制到目标文件。这可以通过

中描述的方法实现

应用于您的示例，以下命令行可实现您想要的：

xmlstarlet ed -a "/configuration/property[last()]" -t elem -n property \
-v "$(xmlstarlet sel -t -c "/configuration/property[name='hbase.zookeeper.quorum']/*" input.xml)" \
target.xml | xmlstarlet unesc | xmlstarlet fo > new_target.xml

new_target.xml中的结果是

<?xml version="1.0"?>
<configuration>
  <property>
    <name>mapreduce.job.split.metainfo.maxsize</name>
    <value>-1</value>
    <description>The maximum permissible size of the split metainfo file.
            The JobTracker won't attempt to read split metainfo files bigger than
            the configured value. No limits if set to -1.
        </description>
  </property>
  <property>
    <name>mapreduce.map.output.compress</name>
    <value>true</value>
    <description>Compress map outputs</description>
  </property>

  ...

  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>ip-nn-nn-nn-nn.ec2.internal</value>
  </property>
</configuration>

但是，这种方法有一个缺点：它会取消转义目标文件中的所有实体（使用 xmlstarlet unesc 命令），因此像 & 这样的实体将被转换为 &。 .. 这可能会破坏东西。

如果这是一个问题，请考虑使用具有完整 XSLT 处理器和样式表的解决方案。

使用 shell 命令将特定 XML 文档元素复制到另一个 XML 文档

Copy particular XML document element to Another XML document using shell commands

xml

bash

shell

kylin