Rvest:使用 css 选择器从 URL 中的不同选项卡中提取数据

Rvest: using css selector pulls data from different tab in URL

我对抓取非常陌生,正在尝试从该网站的某个部分提取数据 - https://projects.fivethirtyeight.com/soccer-predictions/premier-league/。我要获取的数据位于第二个选项卡“匹配”中,标题为“即将到来的匹配”部分。

我已经尝试使用 SelectorGadget 和 rvest 来做到这一点,如下 -

library(rvest)
url <- ("https://projects.fivethirtyeight.com/soccer-predictions/premier-league/")
url %>%
   html_nodes(".prob, .name") %>%
   html_text()

这个 returns 值,但是对应于页面上的第一个选项卡“排名”。如何引用我尝试提取的正确部分?

第一:我不知道R但是Python。

当您单击 Matches 时,页面使用 JavaScript 生成匹配项并从以下位置加载 JSON 数据:

https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_forecast.json

https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_matches.json

https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_clinches.json

我只检查了其中一个 - 2021_premier-league_matches.json - 我看到它有 Completed Matches

的数据

我在Python中做了例子:

import requests

url = 'https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_matches.json'

response = requests.get(url)
data = response.json() 

for item in data:
    # search date
    if item['datetime'].startswith('2022-03-16'):

        print('team1:', item['team1_code'], '|', item['team1'])
        print('prob1:', item['prob1'])
        print('score1:', item['score1'])
        print('adj_score1:', item['adj_score1'])
        print('chances1:', item['chances1'])
        print('moves1:', item['moves1'])
        print('---')

        print('team2:', item['team2_code'], '|', item['team2'])
        print('prob2:', item['prob2'])
        print('score2:', item['score2'])
        print('adj_score2:', item['adj_score2'])
        print('chances2:', item['chances2'])
        print('moves2:', item['moves2'])

        print('----------------------------------------')

结果:

team1: BHA | Brighton and Hove Albion
prob1: 0.30435
score1: 0
adj_score1: 0.0
chances1: 1.244
moves1: 1.682
---
team2: TOT | Tottenham Hotspur
prob2: 0.43627
score2: 2
adj_score2: 2.1
chances2: 1.924
moves2: 1.056
----------------------------------------
team1: ARS | Arsenal
prob1: 0.22114
score1: 0
adj_score1: 0.0
chances1: 0.569
moves1: 0.514
---
team2: LIV | Liverpool
prob2: 0.55306
score2: 2
adj_score2: 2.1
chances2: 1.243
moves2: 0.813
----------------------------------------