你能帮我使用 Rvest 进行网络抓取吗?
Could you please help me with web scraping using Rvest?
我目前正在尝试对以下网站进行网络抓取:https://chicago.suntimes.com/crime/archives
我一直依靠 CSS 选择器小工具来查找 x-path 并进行网页抓取。但是,我无法使用此网站中的小工具,我必须使用 Inspect Source 来找到我需要的东西。我一直试图通过向下滚动每个源来找到相关的 css 和 xpath,但由于我的能力有限,我无法做到这一点。
你能帮我找到
的xpath或css吗
- 标题
- 作者
- 日期
如果这是一份干货清单,我很抱歉……但我真的被困住了。如果您能给我一些帮助,我将不胜感激!
非常感谢。
对于您想要提取的每个元素,如果您使用选择器小工具找到具有相应 class 的相关标签,您将能够获得您想要的内容。
library(rvest)
url <- 'https://chicago.suntimes.com/crime/archives'
webpage <- url %>% read_html()
title <- webpage %>% html_nodes('h2.c-entry-box--compact__title') %>% html_text()
author <- webpage %>% html_nodes('span.c-byline__author-name') %>% html_text()
date <- webpage %>% html_nodes('time.c-byline__item')%>% html_text() %>% trimws()
result <- data.frame(title, author, date)
result
result
# title author date
#1 Belmont Cragin man charged with carjacking in Little Village: police Sun-Times Wire February 17
#2 Gas station robbed, man carjacked in Horner Park Jermaine Nolen February 17
#3 8 shot, 2 fatally, Tuesday in Chicago Sun-Times Wire February 17
#4 Businesses robbed at gunpoint on the Northwest Side: police Sun-Times Wire February 17
#5 Man charged with carjacking in Aurora Sun-Times Wire February 16
#6 Woman fatally stabbed in Park Manor apartment Sun-Times Wire February 16
#7 Woman critically hurt by gunfire in Woodlawn David Struett February 16
#8 Teen boy, 17, charged with attempted carjacking in Back of the Yards Sun-Times Wire February 16
#...
#...
我目前正在尝试对以下网站进行网络抓取:https://chicago.suntimes.com/crime/archives
我一直依靠 CSS 选择器小工具来查找 x-path 并进行网页抓取。但是,我无法使用此网站中的小工具,我必须使用 Inspect Source 来找到我需要的东西。我一直试图通过向下滚动每个源来找到相关的 css 和 xpath,但由于我的能力有限,我无法做到这一点。
你能帮我找到
的xpath或css吗- 标题
- 作者
- 日期
如果这是一份干货清单,我很抱歉……但我真的被困住了。如果您能给我一些帮助,我将不胜感激!
非常感谢。
对于您想要提取的每个元素,如果您使用选择器小工具找到具有相应 class 的相关标签,您将能够获得您想要的内容。
library(rvest)
url <- 'https://chicago.suntimes.com/crime/archives'
webpage <- url %>% read_html()
title <- webpage %>% html_nodes('h2.c-entry-box--compact__title') %>% html_text()
author <- webpage %>% html_nodes('span.c-byline__author-name') %>% html_text()
date <- webpage %>% html_nodes('time.c-byline__item')%>% html_text() %>% trimws()
result <- data.frame(title, author, date)
result
result
# title author date
#1 Belmont Cragin man charged with carjacking in Little Village: police Sun-Times Wire February 17
#2 Gas station robbed, man carjacked in Horner Park Jermaine Nolen February 17
#3 8 shot, 2 fatally, Tuesday in Chicago Sun-Times Wire February 17
#4 Businesses robbed at gunpoint on the Northwest Side: police Sun-Times Wire February 17
#5 Man charged with carjacking in Aurora Sun-Times Wire February 16
#6 Woman fatally stabbed in Park Manor apartment Sun-Times Wire February 16
#7 Woman critically hurt by gunfire in Woodlawn David Struett February 16
#8 Teen boy, 17, charged with attempted carjacking in Back of the Yards Sun-Times Wire February 16
#...
#...