将源 link URL 添加到 R 中的网络抓取数据
Add source link URL to web scraped data in R
我有 link 需要抓取。
如何将这些 links 地址作为变量添加到抓取的数据中或仅附加?
links
for (i in links)
{
url<- (i)
SC <- read_html(url) %>% html_nodes ("NODES") %>% html_text ()
Data<-rbind(SC)
}
我得到的数据是(例如)
1. "name"
2. "price"
3. "date"
如何将第 4 个属性添加为 link URL
1. "name"
2. "price"
3. "date"
4. "source link address"
或者在另一栏?
谢谢
试试这个
library(rvest)
library(magrittr)
links <- c("http://www.mothercare.com/christmas/gifts-for-babies/?q=christmas%27",
"http://www.mothercare.com/christmas/christmas-clothing/?q=christmas%27")
Data <- lapply(links, function(x){
h <- read_html(x)
items <- h %>% html_nodes(".m-title_link") %>% html_text %>% gsub("\n", "", .)
price <- h %>% html_nodes(".m-sales_price") %>% html_text %>% gsub("\n", "", .)
urls <- h %>% html_nodes(".m-title_link") %>% xml_attr("href") %>%
paste0("http://www.mothercare.com", .)
data.frame(Name=items, Price=price, Link=x, Urls=urls)
})
Data <- do.call(rbind, Data)
View(Data)
我有 link 需要抓取。
如何将这些 links 地址作为变量添加到抓取的数据中或仅附加?
links
for (i in links)
{
url<- (i)
SC <- read_html(url) %>% html_nodes ("NODES") %>% html_text ()
Data<-rbind(SC)
}
我得到的数据是(例如)
1. "name"
2. "price"
3. "date"
如何将第 4 个属性添加为 link URL
1. "name"
2. "price"
3. "date"
4. "source link address"
或者在另一栏? 谢谢
试试这个
library(rvest)
library(magrittr)
links <- c("http://www.mothercare.com/christmas/gifts-for-babies/?q=christmas%27",
"http://www.mothercare.com/christmas/christmas-clothing/?q=christmas%27")
Data <- lapply(links, function(x){
h <- read_html(x)
items <- h %>% html_nodes(".m-title_link") %>% html_text %>% gsub("\n", "", .)
price <- h %>% html_nodes(".m-sales_price") %>% html_text %>% gsub("\n", "", .)
urls <- h %>% html_nodes(".m-title_link") %>% xml_attr("href") %>%
paste0("http://www.mothercare.com", .)
data.frame(Name=items, Price=price, Link=x, Urls=urls)
})
Data <- do.call(rbind, Data)
View(Data)