从 xml 获取元素并使用 shell 脚本将其存储在数组中

Question

我有我的 xml 喜欢 :

<URLS xmlns:"http://www.example.com">
    <Service>
        <forwardUrl>
            <value>http://www.example1.com:80</value>
            <value>http://www.example2.com:80</value>
            .
            .
            .
       </forwardUrl>
    </Service>
</URLS>

我想将所有转发 url 存储在一个数组中。

我试过这样做：

let urlcount=$(sed -e "s/xmlns/ignore/" /tmp/in.xml | xmllint --xpath "count(//forwardUrl/value)"  -)
declare -a urls=()

for((i=1; i <= $urlcount; i++)); do
    echo $i
    urls[$i]=$(sed -e "s/xmlns/ignore/" /tmp/in.xml | xmllint --xpath '//forwardUrl/value["$i"]/text()' -)
done

但是当我执行 echo ${urls[7]} 时，它会打印所有值。

我想在不同的索引中存储不同的网址。请帮我解决这个问题。

Answer 1

试试这个

$ urls=($(sed -nr 's_<value>(.*)</value>__p' file)); echo ${urls[1]}
http://www.example2.com:80
$ echo ${urls[0]}
http://www.example1.com:80

显然不关心 xml 结构，因此假设 "value" 标签仅用于 url。

更新： 如果上下文很重要 awk 来救援！

$ awk -F'[<>]' -v RS="</?forwardUrl>" 'NR==2{for(i=3;i<=NF;i+=4) print $i}' file

http://www.example1.com:80
http://www.example2.com:80

其余相同

$ urls=($(awk ... ))

请注意，此正则表达式 RS 是特定于 gawk 的，可能在其他 awks 中不受支持。

Answer 2

只使用 sed:

这样的东西怎么样

$ cat file1
<URLS xmlns:"http://www.example.com">
    <Service>
        <forwardUrl>
            <value>http://www.example1.com:80</value>
            <value>http://www.example2.com:80</value>
            <value>http://www.example3.com:80</value>
            <value>http://www.example4.com:80</value>
       </forwardUrl>
    </Service>
</URLS>
$ declare -a array=($(sed -n '/\s*<forwardUrl>/,/<\/forwardUrl>/p' file1 | sed -e 's/<[^>]*>//g' -e '/^\s*$/d' -e 's/\s*//g'))
$ echo "${array[0]}"
http://www.example1.com:80
$ echo "${array[1]}"
http://www.example2.com:80
$ echo "${array[2]}"
http://www.example3.com:80
$ echo "${array[3]}"
http://www.example4.com:80
$ echo "${array[@]}"
http://www.example1.com:80 http://www.example2.com:80 http://www.example3.com:80 http://www.example4.com:80
$

表达式细分：

declare -a array=($(sed -n '/\s*<forwardUrl>/,/<\/forwardUrl>/p' file1 | sed -e 's/<[^>]*>//g' -e '/^\s*$/d' -e 's/\s*//g'))

sed -n '/\s*<forwardUrl>/,/<\/forwardUrl>/p' file1 在匹配 <forwardUrl> 和 </forwardUrl>（包括）
sed -e 's/<[^>]*>//g' -e '/^\s*$/d' -e 's/\s*//g' 第一个表达式删除所有标签，第二个表达式删除所有空行（有空格），最后一个表达式只删除所有空格

编辑 1：

$ cat file1
<URLS xmlns:"http://www.example.com">
    <Service>
        <forwardUrl>
            <value>http://www.sun.com:80</value>
            <value>http://www.example2.com:80</value>
            <value>http://www.example3.com:80</value>
            <value>http://www.example4.com:80</value>
       </forwardUrl>
    </Service>
</URLS>
$ declare -a array=($(sed -n '/\s*<forwardUrl>/,/<\/forwardUrl>/p' file1 | sed -e 's/<[^>]*>//g' -e '/^\s*$/d' -e 's/\s*//g'))
$ echo "${array[0]}"
http://www.sun.com:80
$ echo "${array[@]}"
http://www.sun.com:80 http://www.example2.com:80 http://www.example3.com:80 http://www.example4.com:80
$

从 xml 获取元素并使用 shell 脚本将其存储在数组中

Getting elements from xml and storing it in array using shell script

xml

bash

xpath

sed

xmllint