从网页的 HTML 代码中提取特定信息?
Extract a Particular Information from a HTML Code of a webpage?
我有一个网页“http://www.jabong.com/playdate-Off-White-Casual-Top-1342500.html?pos=1”,我可以得到它的 HTML 代码...但我需要提取特定信息...从上面的页面我会需要以下信息:
类型:休闲上衣,面料:棉,袖子:半袖,领口:圆领,版型:常规,洗涤护理:手洗,使用温和的洗涤剂,洗前取下腰带/胸针,颜色:米白色,面料细节:95/5 棉莱卡,款式:图案,SKU:PL527KA99JYQINDFAS
您需要 HTML scraper/parser 例如 rvest
:
library(rvest)
url <- 'http://www.jabong.com/playdate-Off-White-Casual-Top-1342500.html?pos=1'
# get HTML, select list node with the information
page <- url %>% read_html() %>% html_node('.prod-main-wrapper')
# select the nodes within the list of each type, and get the text inside
variable <- page %>% html_nodes('label') %>% html_text()
value <- page %>% html_nodes('span') %>% html_text()
# put the text in a nice data.frame
data.frame(variable, value)
# variable value
# 1 Type Casual Tops
# 2 Fabric Cotton
# 3 Sleeves Half Sleeves
# 4 Neck Round neck
# 5 Fit Regular
# 6 Wash Care Hand Wash, Use Mild Detergents, Remove Belts / Broaches Before Wash
# 7 Color Off White
# 8 Fabric Details 95/5 Cotton Lycra
# 9 Style Graphic
# 10 SKU PL527KA99JYQINDFAS
# 11 Authorization Playdate authorized online sales partner. View Certificate
我有一个网页“http://www.jabong.com/playdate-Off-White-Casual-Top-1342500.html?pos=1”,我可以得到它的 HTML 代码...但我需要提取特定信息...从上面的页面我会需要以下信息:
类型:休闲上衣,面料:棉,袖子:半袖,领口:圆领,版型:常规,洗涤护理:手洗,使用温和的洗涤剂,洗前取下腰带/胸针,颜色:米白色,面料细节:95/5 棉莱卡,款式:图案,SKU:PL527KA99JYQINDFAS
您需要 HTML scraper/parser 例如 rvest
:
library(rvest)
url <- 'http://www.jabong.com/playdate-Off-White-Casual-Top-1342500.html?pos=1'
# get HTML, select list node with the information
page <- url %>% read_html() %>% html_node('.prod-main-wrapper')
# select the nodes within the list of each type, and get the text inside
variable <- page %>% html_nodes('label') %>% html_text()
value <- page %>% html_nodes('span') %>% html_text()
# put the text in a nice data.frame
data.frame(variable, value)
# variable value
# 1 Type Casual Tops
# 2 Fabric Cotton
# 3 Sleeves Half Sleeves
# 4 Neck Round neck
# 5 Fit Regular
# 6 Wash Care Hand Wash, Use Mild Detergents, Remove Belts / Broaches Before Wash
# 7 Color Off White
# 8 Fabric Details 95/5 Cotton Lycra
# 9 Style Graphic
# 10 SKU PL527KA99JYQINDFAS
# 11 Authorization Playdate authorized online sales partner. View Certificate