从 XML 子页面中抓取位置不同的项目

Question

到目前为止我没有成功从这个页面table"Die Verlustursache"抓取

http://www.ubootarchiv.de/ubootwiki/index.php/U_205

使用图书馆 (XML) (rvest) (readr)

我可以使用像

这样的单独代码行来解决站点上所有 table 的问题

table <-readHTMLTable("http://www.ubootarchiv.de/ubootwiki/index.php/U_203") %>% .[1]

但是所有其他网站上的数字都不同。检查例如：http://www.ubootarchiv.de/ubootwiki/index.php/U_27

我刚刚意识到我需要的table总是倒数第四个（意思是：最后一个table减4）。

在另一个抓取项目中，我曾经使用这一行只抓取列表页面的最后一项：

html_nodes(xpath="/html/body/div/div[3]/div[2]/div[1]/div[2]/div/table/tbody/tr[last()]"

但是，我无法找到 "last - 4"

之类的解决方案

请提前告知&谢谢

Answer 1

如果它总是倒数第四个，你可以使用它table:

table <-readHTMLTable("http://www.ubootarchiv.de/ubootwiki/index.php/U_203") 


table[length(table) - 4]

Scraping an item that varies in position from a XML subpage