Rvest:使用 css 选择器从 URL 中的不同选项卡中提取数据
Rvest: using css selector pulls data from different tab in URL
我对抓取非常陌生,正在尝试从该网站的某个部分提取数据 - https://projects.fivethirtyeight.com/soccer-predictions/premier-league/。我要获取的数据位于第二个选项卡“匹配”中,标题为“即将到来的匹配”部分。
我已经尝试使用 SelectorGadget 和 rvest 来做到这一点,如下 -
library(rvest)
url <- ("https://projects.fivethirtyeight.com/soccer-predictions/premier-league/")
url %>%
html_nodes(".prob, .name") %>%
html_text()
这个 returns 值,但是对应于页面上的第一个选项卡“排名”。如何引用我尝试提取的正确部分?
第一:我不知道R但是Python。
当您单击 Matches
时,页面使用 JavaScript 生成匹配项并从以下位置加载 JSON 数据:
https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_forecast.json
https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_matches.json
https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_clinches.json
我只检查了其中一个 - 2021_premier-league_matches.json
- 我看到它有 Completed Matches
的数据
我在Python中做了例子:
import requests
url = 'https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_matches.json'
response = requests.get(url)
data = response.json()
for item in data:
# search date
if item['datetime'].startswith('2022-03-16'):
print('team1:', item['team1_code'], '|', item['team1'])
print('prob1:', item['prob1'])
print('score1:', item['score1'])
print('adj_score1:', item['adj_score1'])
print('chances1:', item['chances1'])
print('moves1:', item['moves1'])
print('---')
print('team2:', item['team2_code'], '|', item['team2'])
print('prob2:', item['prob2'])
print('score2:', item['score2'])
print('adj_score2:', item['adj_score2'])
print('chances2:', item['chances2'])
print('moves2:', item['moves2'])
print('----------------------------------------')
结果:
team1: BHA | Brighton and Hove Albion
prob1: 0.30435
score1: 0
adj_score1: 0.0
chances1: 1.244
moves1: 1.682
---
team2: TOT | Tottenham Hotspur
prob2: 0.43627
score2: 2
adj_score2: 2.1
chances2: 1.924
moves2: 1.056
----------------------------------------
team1: ARS | Arsenal
prob1: 0.22114
score1: 0
adj_score1: 0.0
chances1: 0.569
moves1: 0.514
---
team2: LIV | Liverpool
prob2: 0.55306
score2: 2
adj_score2: 2.1
chances2: 1.243
moves2: 0.813
----------------------------------------
我对抓取非常陌生,正在尝试从该网站的某个部分提取数据 - https://projects.fivethirtyeight.com/soccer-predictions/premier-league/。我要获取的数据位于第二个选项卡“匹配”中,标题为“即将到来的匹配”部分。
我已经尝试使用 SelectorGadget 和 rvest 来做到这一点,如下 -
library(rvest)
url <- ("https://projects.fivethirtyeight.com/soccer-predictions/premier-league/")
url %>%
html_nodes(".prob, .name") %>%
html_text()
这个 returns 值,但是对应于页面上的第一个选项卡“排名”。如何引用我尝试提取的正确部分?
第一:我不知道R但是Python。
当您单击 Matches
时,页面使用 JavaScript 生成匹配项并从以下位置加载 JSON 数据:
https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_forecast.json
https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_matches.json
https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_clinches.json
我只检查了其中一个 - 2021_premier-league_matches.json
- 我看到它有 Completed Matches
我在Python中做了例子:
import requests
url = 'https://projects.fivethirtyeight.com/soccer-predictions/forecasts/2021_premier-league_matches.json'
response = requests.get(url)
data = response.json()
for item in data:
# search date
if item['datetime'].startswith('2022-03-16'):
print('team1:', item['team1_code'], '|', item['team1'])
print('prob1:', item['prob1'])
print('score1:', item['score1'])
print('adj_score1:', item['adj_score1'])
print('chances1:', item['chances1'])
print('moves1:', item['moves1'])
print('---')
print('team2:', item['team2_code'], '|', item['team2'])
print('prob2:', item['prob2'])
print('score2:', item['score2'])
print('adj_score2:', item['adj_score2'])
print('chances2:', item['chances2'])
print('moves2:', item['moves2'])
print('----------------------------------------')
结果:
team1: BHA | Brighton and Hove Albion
prob1: 0.30435
score1: 0
adj_score1: 0.0
chances1: 1.244
moves1: 1.682
---
team2: TOT | Tottenham Hotspur
prob2: 0.43627
score2: 2
adj_score2: 2.1
chances2: 1.924
moves2: 1.056
----------------------------------------
team1: ARS | Arsenal
prob1: 0.22114
score1: 0
adj_score1: 0.0
chances1: 0.569
moves1: 0.514
---
team2: LIV | Liverpool
prob2: 0.55306
score2: 2
adj_score2: 2.1
chances2: 1.243
moves2: 0.813
----------------------------------------