如何删除文本中的某个节点?

How remove a certain node in text?

https://www.iana.org/domains/arpa

我可以使用 xpath '//table[@id="arpa-table"]/tbody/tr/join((td[1], normalize-space(td[2])), x:cps(9))' 与 xidel。但我想把 RFC 3172 之类的东西放在第三列, /go/rfc3172 放在第四列。有人告诉我你是怎么做到的吗?

arpa▸   Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board RFC 3172¬
as112.arpa▸ For sinking DNS traffic for reverse IP address lookups and other applications RFC 7535¬
e164.arpa▸  For mapping E.164 numbers to Internet URIs RFC 6116¬
home.arpa▸  For non-unique use in residential home networks RFC 8375¬
in-addr-servers.arpa▸   For hosting authoritative name servers for the in-addr.arpa domain RFC 5855¬
in-addr.arpa▸   For mapping IPv4 addresses to Internet domain names RFC 1035¬
ip6-servers.arpa▸   For hosting authoritative name servers for the ip6.arpa domain RFC 5855¬
ip6.arpa▸   For mapping IPv6 addresses to Internet domain names RFC 3152¬
ipv4only.arpa▸  For detecting the presence of DNS64 and for learning the IPv6 prefix used for protocol translation RFC 7050¬
iris.arpa▸  For locating Internet Registry Information Services RFC 4698¬
uri.arpa▸   For resolving Uniform Resource Identifiers according to the Dynamic Delegation Discovery System RFC 3405 RFC 8958¬
urn.arpa▸   For resolving Uniform Resource Names according to the Dynamic Delegation Discovery System RFC 3405¬

第一行应该是这样的

arpa▸   Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board▸ RFC 3172¬

默认情况下 xidel 打印 node/element 其字符串值 (string())。它是 "所有后代文本节点的字符串值的串联", as E. Lenz puts it:

$ xidel -s https://www.iana.org/domains/arpa -e '
  //table[@id="arpa-table"]/tbody/tr[1]/td[2] ! (position(),.)
'
#or
$ xidel -s https://www.iana.org/domains/arpa -e '
  //table[@id="arpa-table"]/tbody/tr[1]/td[2]/string() ! (position(),.)
'
1
Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board

                RFC 3172


如您所见,1 item/node.
这就是为什么 normalize-space(td[2]) returns Reserved exclusively [...] RFC 3172.

另一方面,使用 text() 你会得到 node/element 它的直接文本节点:

$ xidel -s https://www.iana.org/domains/arpa -e '
  //table[@id="arpa-table"]/tbody/tr[1]/td[2]/text() ! (position(),.)
'
1
Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board
2



3



或其所有后代文本节点:

$ xidel -s https://www.iana.org/domains/arpa -e '
  //table[@id="arpa-table"]/tbody/tr[1]/td[2]//text() ! (position(),.)
'
1
Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board
2



3
RFC 3172
4



如您所见,3 个和 4 个不同 items/nodes。

要获得第 1st 个文本节点,只需 td[2]/text()[1] 即可,但 normalize-space(td[2]/text()) 甚至 normalize-space(td[2]//text()) 也可以。