xml 在 R 中,删除段落但保留 xml class
xml in R, remove paragraphs but keep xml class
我正在尝试从 R 中的 XML 文档中删除一些段落,但我想保留 XML structure/class。这是一些示例文本和我失败的尝试:
library(xml2)
text = read_xml("<paper> <caption><p>The main title</p> <p>A sub title</p></caption> <p>The opening paragraph.</p> </paper>")
xml_find_all(text, './/caption//p') %>% xml_remove() # deletes text
xml_find_all(text, './/caption//p') %>% xml_text() # removes paragraphs but also XML structure
以下是我想要的结尾(只是删除了标题中的段落):
ideal_text = read_xml("<paper> <caption>The main title A sub title</caption> <p>The opening paragraph.</p> </paper>")
ideal_text
看起来这需要多个步骤。找到节点,复制文本,删除节点的内容,然后更新。
library(xml2)
library(magrittr)
text = read_xml("<paper> <caption><p>The main title</p> <p>A sub title</p></caption> <p>The opening paragraph.</p> </paper>")
# find the caption
caption <- xml_find_all(text, './/caption')
#store existing text
replacemement<- caption %>% xml_find_all( './/p') %>% xml_text() %>% paste(collapse = " ")
#remove the desired text
caption %>% xml_find_all( './/p') %>% xml_remove()
#replace the caption
xml_text(caption) <- replacemement
text #test
{xml_document}
<paper>
[1] <caption>The main title A sub title</caption>
[2] <p>The opening paragraph.</p>
很可能您需要获取 vector/list 的字幕节点,然后通过循环逐一遍历它们。
我正在尝试从 R 中的 XML 文档中删除一些段落,但我想保留 XML structure/class。这是一些示例文本和我失败的尝试:
library(xml2)
text = read_xml("<paper> <caption><p>The main title</p> <p>A sub title</p></caption> <p>The opening paragraph.</p> </paper>")
xml_find_all(text, './/caption//p') %>% xml_remove() # deletes text
xml_find_all(text, './/caption//p') %>% xml_text() # removes paragraphs but also XML structure
以下是我想要的结尾(只是删除了标题中的段落):
ideal_text = read_xml("<paper> <caption>The main title A sub title</caption> <p>The opening paragraph.</p> </paper>")
ideal_text
看起来这需要多个步骤。找到节点,复制文本,删除节点的内容,然后更新。
library(xml2)
library(magrittr)
text = read_xml("<paper> <caption><p>The main title</p> <p>A sub title</p></caption> <p>The opening paragraph.</p> </paper>")
# find the caption
caption <- xml_find_all(text, './/caption')
#store existing text
replacemement<- caption %>% xml_find_all( './/p') %>% xml_text() %>% paste(collapse = " ")
#remove the desired text
caption %>% xml_find_all( './/p') %>% xml_remove()
#replace the caption
xml_text(caption) <- replacemement
text #test
{xml_document}
<paper>
[1] <caption>The main title A sub title</caption>
[2] <p>The opening paragraph.</p>
很可能您需要获取 vector/list 的字幕节点,然后通过循环逐一遍历它们。