使用 rvest 从特定 html pagein R 中抓取评论
using rvest to scrape a review from particular html pagein R
我正在抓取页面 tata safari discription 以获取评论和用户评论。我正在使用选择器小工具获取 css 标签。到目前为止我所做的事情是:
teambhp <- read_html("http://www.team-bhp.com/forum/official-new-car-reviews/171841-tata-safari-storme-varicor-400-official-review.html")
titles <- teambhp %>% html_node("hr+ div , i ,strong u , #posts ") %>% html_text()
但是它只保存了一个title inn titles变量。并发出如下警告。
Warning message:
In node_find_one(x$node, x$doc, xpath = xpath, nsMap = ns) :
23 matches for .//hr/following-sibling::*[name() = 'div' and (position() = 1)] | .//i | .//strong/descendant-or-self::*/u | .//*[@id = 'posts']:
using first
我希望所有 23 个都保存在列表中。我该怎么做?
见help("html_node)
:
html_node vs html_nodes
html_node is like [[ it always extracts exactly one element. When given a list of nodes, html_node will always return a list of the same length, the length of html_nodes might be longer or shorter.
您需要将其替换为html_nodes()
(注意s):
titles <- teambhp %>% html_nodes("hr+ div , i ,strong u , #posts ") %>% html_text()
我正在抓取页面 tata safari discription 以获取评论和用户评论。我正在使用选择器小工具获取 css 标签。到目前为止我所做的事情是:
teambhp <- read_html("http://www.team-bhp.com/forum/official-new-car-reviews/171841-tata-safari-storme-varicor-400-official-review.html")
titles <- teambhp %>% html_node("hr+ div , i ,strong u , #posts ") %>% html_text()
但是它只保存了一个title inn titles变量。并发出如下警告。
Warning message:
In node_find_one(x$node, x$doc, xpath = xpath, nsMap = ns) :
23 matches for .//hr/following-sibling::*[name() = 'div' and (position() = 1)] | .//i | .//strong/descendant-or-self::*/u | .//*[@id = 'posts']:
using first
我希望所有 23 个都保存在列表中。我该怎么做?
见help("html_node)
:
html_node vs html_nodes
html_node is like [[ it always extracts exactly one element. When given a list of nodes, html_node will always return a list of the same length, the length of html_nodes might be longer or shorter.
您需要将其替换为html_nodes()
(注意s):
titles <- teambhp %>% html_nodes("hr+ div , i ,strong u , #posts ") %>% html_text()