如何删除文本中的某个节点?
How remove a certain node in text?
https://www.iana.org/domains/arpa
我可以使用 xpath '//table[@id="arpa-table"]/tbody/tr/join((td[1], normalize-space(td[2])), x:cps(9))' 与 xidel。但我想把 RFC 3172
之类的东西放在第三列, /go/rfc3172
放在第四列。有人告诉我你是怎么做到的吗?
arpa▸ Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board RFC 3172¬
as112.arpa▸ For sinking DNS traffic for reverse IP address lookups and other applications RFC 7535¬
e164.arpa▸ For mapping E.164 numbers to Internet URIs RFC 6116¬
home.arpa▸ For non-unique use in residential home networks RFC 8375¬
in-addr-servers.arpa▸ For hosting authoritative name servers for the in-addr.arpa domain RFC 5855¬
in-addr.arpa▸ For mapping IPv4 addresses to Internet domain names RFC 1035¬
ip6-servers.arpa▸ For hosting authoritative name servers for the ip6.arpa domain RFC 5855¬
ip6.arpa▸ For mapping IPv6 addresses to Internet domain names RFC 3152¬
ipv4only.arpa▸ For detecting the presence of DNS64 and for learning the IPv6 prefix used for protocol translation RFC 7050¬
iris.arpa▸ For locating Internet Registry Information Services RFC 4698¬
uri.arpa▸ For resolving Uniform Resource Identifiers according to the Dynamic Delegation Discovery System RFC 3405 RFC 8958¬
urn.arpa▸ For resolving Uniform Resource Names according to the Dynamic Delegation Discovery System RFC 3405¬
第一行应该是这样的
arpa▸ Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board▸ RFC 3172¬
默认情况下 xidel
打印 node/element 其字符串值 (string()
)。它是 "所有后代文本节点的字符串值的串联", as E. Lenz puts it:
$ xidel -s https://www.iana.org/domains/arpa -e '
//table[@id="arpa-table"]/tbody/tr[1]/td[2] ! (position(),.)
'
#or
$ xidel -s https://www.iana.org/domains/arpa -e '
//table[@id="arpa-table"]/tbody/tr[1]/td[2]/string() ! (position(),.)
'
1
Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board
RFC 3172
如您所见,1 item/node.
这就是为什么 normalize-space(td[2])
returns Reserved exclusively [...] RFC 3172
.
另一方面,使用 text()
你会得到 node/element 它的直接文本节点:
$ xidel -s https://www.iana.org/domains/arpa -e '
//table[@id="arpa-table"]/tbody/tr[1]/td[2]/text() ! (position(),.)
'
1
Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board
2
3
或其所有后代文本节点:
$ xidel -s https://www.iana.org/domains/arpa -e '
//table[@id="arpa-table"]/tbody/tr[1]/td[2]//text() ! (position(),.)
'
1
Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board
2
3
RFC 3172
4
如您所见,3 个和 4 个不同 items/nodes。
要获得第 1st 个文本节点,只需 td[2]/text()[1]
即可,但 normalize-space(td[2]/text())
甚至 normalize-space(td[2]//text())
也可以。
https://www.iana.org/domains/arpa
我可以使用 xpath '//table[@id="arpa-table"]/tbody/tr/join((td[1], normalize-space(td[2])), x:cps(9))' 与 xidel。但我想把 RFC 3172
之类的东西放在第三列, /go/rfc3172
放在第四列。有人告诉我你是怎么做到的吗?
arpa▸ Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board RFC 3172¬
as112.arpa▸ For sinking DNS traffic for reverse IP address lookups and other applications RFC 7535¬
e164.arpa▸ For mapping E.164 numbers to Internet URIs RFC 6116¬
home.arpa▸ For non-unique use in residential home networks RFC 8375¬
in-addr-servers.arpa▸ For hosting authoritative name servers for the in-addr.arpa domain RFC 5855¬
in-addr.arpa▸ For mapping IPv4 addresses to Internet domain names RFC 1035¬
ip6-servers.arpa▸ For hosting authoritative name servers for the ip6.arpa domain RFC 5855¬
ip6.arpa▸ For mapping IPv6 addresses to Internet domain names RFC 3152¬
ipv4only.arpa▸ For detecting the presence of DNS64 and for learning the IPv6 prefix used for protocol translation RFC 7050¬
iris.arpa▸ For locating Internet Registry Information Services RFC 4698¬
uri.arpa▸ For resolving Uniform Resource Identifiers according to the Dynamic Delegation Discovery System RFC 3405 RFC 8958¬
urn.arpa▸ For resolving Uniform Resource Names according to the Dynamic Delegation Discovery System RFC 3405¬
第一行应该是这样的
arpa▸ Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board▸ RFC 3172¬
默认情况下 xidel
打印 node/element 其字符串值 (string()
)。它是 "所有后代文本节点的字符串值的串联", as E. Lenz puts it:
$ xidel -s https://www.iana.org/domains/arpa -e '
//table[@id="arpa-table"]/tbody/tr[1]/td[2] ! (position(),.)
'
#or
$ xidel -s https://www.iana.org/domains/arpa -e '
//table[@id="arpa-table"]/tbody/tr[1]/td[2]/string() ! (position(),.)
'
1
Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board
RFC 3172
如您所见,1 item/node.
这就是为什么 normalize-space(td[2])
returns Reserved exclusively [...] RFC 3172
.
另一方面,使用 text()
你会得到 node/element 它的直接文本节点:
$ xidel -s https://www.iana.org/domains/arpa -e '
//table[@id="arpa-table"]/tbody/tr[1]/td[2]/text() ! (position(),.)
'
1
Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board
2
3
或其所有后代文本节点:
$ xidel -s https://www.iana.org/domains/arpa -e '
//table[@id="arpa-table"]/tbody/tr[1]/td[2]//text() ! (position(),.)
'
1
Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board
2
3
RFC 3172
4
如您所见,3 个和 4 个不同 items/nodes。
要获得第 1st 个文本节点,只需 td[2]/text()[1]
即可,但 normalize-space(td[2]/text())
甚至 normalize-space(td[2]//text())
也可以。