你能帮我使用 Rvest 进行网络抓取吗?

Could you please help me with web scraping using Rvest?

我目前正在尝试对以下网站进行网络抓取:https://chicago.suntimes.com/crime/archives

我一直依靠 CSS 选择器小工具来查找 x-path 并进行网页抓取。但是,我无法使用此网站中的小工具,我必须使用 Inspect Source 来找到我需要的东西。我一直试图通过向下滚动每个源来找到相关的 css 和 xpath,但由于我的能力有限,我无法做到这一点。

你能帮我找到

的xpath或css吗

如果这是一份干货清单,我很抱歉……但我真的被困住了。如果您能给我一些帮助,我将不胜感激!

非常感谢。

对于您想要提取的每个元素,如果您使用选择器小工具找到具有相应 class 的相关标签,您将能够获得您想要的内容。

library(rvest)
url <- 'https://chicago.suntimes.com/crime/archives'

webpage <- url %>% read_html() 
title <- webpage %>% html_nodes('h2.c-entry-box--compact__title') %>% html_text()
author <- webpage %>% html_nodes('span.c-byline__author-name') %>% html_text()
date <- webpage %>% html_nodes('time.c-byline__item')%>% html_text() %>% trimws()
result <- data.frame(title, author, date)
result

result
#                                                                                               title              author        date
#1                               Belmont Cragin man charged with carjacking in Little Village: police       Sun-Times Wire February 17
#2                                                   Gas station robbed, man carjacked in Horner Park       Jermaine Nolen February 17
#3                                                              8 shot, 2 fatally, Tuesday in Chicago       Sun-Times Wire February 17
#4                                        Businesses robbed at gunpoint on the Northwest Side: police       Sun-Times Wire February 17
#5                                                              Man charged with carjacking in Aurora       Sun-Times Wire February 16
#6                                                       Woman fatally stabbed in Park Manor apartment      Sun-Times Wire February 16
#7                                                        Woman critically hurt by gunfire in Woodlawn       David Struett February 16
#8                                Teen boy, 17, charged with attempted carjacking in Back of the Yards      Sun-Times Wire February 16
#...
#...