使用 R 包 `stringr` 时出现原子向量错误
Atomic vector error when using R package `stringr`
我想使用软件包 rvest
从网页中获取 gas 价格。但是,我无法提取数值,必须按 html class .sp_p
.
提取
library(rvest)
desmoines <- html("http://www.desmoinesgasprices.com/")
拉动 gas 价格:
price <- desmoines %>%
html_nodes(".sp_p")
head(price, 3)
输出:
[[1]]
<div class="sp_p">
<div class="p2"></div>
<div class="pd"></div>
<div class="p5"></div>
<div class="p5"></div>
</div>
[[2]]
<div class="sp_p">
<div class="p2"></div>
<div class="pd"></div>
<div class="p5"></div>
<div class="p6"></div>
</div>
[[3]]
<div class="sp_p">
<div class="p2"></div>
<div class="pd"></div>
<div class="p5"></div>
<div class="p7"></div>
</div>
attr(,"class")
[1] "XMLNodeSet"
现在,我想使用包 stringr
从网络抓取中提取数字,但我不能使用 stringr
,因为 price
不是原子向量。我该如何解决这个问题?
这是一种可能性:
library(stringr)
pr <- xml_children(price)
p_raw <- sapply(1:length(pr), function(x) paste(xml_attrs(pr[[x]]),collapse=""))
p_readable <- paste0("$",str_replace_all(p_raw,c("d"=".","p"="")))
#> p_readable
# [1] ".49" ".57" ".59" ".59" ".59" ".59" ".59" ".59" ".61" ".64" ".67" ".68" ".68"
#[14] ".68" ".08" ".99" ".98" ".98" ".98" ".98" ".98" ".98" ".98" ".98" ".98" ".98"
#[27] ".98" ".98" ".98"
我想使用软件包 rvest
从网页中获取 gas 价格。但是,我无法提取数值,必须按 html class .sp_p
.
library(rvest)
desmoines <- html("http://www.desmoinesgasprices.com/")
拉动 gas 价格:
price <- desmoines %>%
html_nodes(".sp_p")
head(price, 3)
输出:
[[1]]
<div class="sp_p">
<div class="p2"></div>
<div class="pd"></div>
<div class="p5"></div>
<div class="p5"></div>
</div>
[[2]]
<div class="sp_p">
<div class="p2"></div>
<div class="pd"></div>
<div class="p5"></div>
<div class="p6"></div>
</div>
[[3]]
<div class="sp_p">
<div class="p2"></div>
<div class="pd"></div>
<div class="p5"></div>
<div class="p7"></div>
</div>
attr(,"class")
[1] "XMLNodeSet"
现在,我想使用包 stringr
从网络抓取中提取数字,但我不能使用 stringr
,因为 price
不是原子向量。我该如何解决这个问题?
这是一种可能性:
library(stringr)
pr <- xml_children(price)
p_raw <- sapply(1:length(pr), function(x) paste(xml_attrs(pr[[x]]),collapse=""))
p_readable <- paste0("$",str_replace_all(p_raw,c("d"=".","p"="")))
#> p_readable
# [1] ".49" ".57" ".59" ".59" ".59" ".59" ".59" ".59" ".61" ".64" ".67" ".68" ".68"
#[14] ".68" ".08" ".99" ".98" ".98" ".98" ".98" ".98" ".98" ".98" ".98" ".98" ".98"
#[27] ".98" ".98" ".98"