如何拉取特定的 "data-stat" 值? (python)
How to a pull a specific "data-stat" value? (python)
到目前为止,代码从 https://www.basketball-reference.com 中拉出一个页面,并使用数据统计 class(???).[=15= 获取 tr_body 中的任何数据]
我需要一种方法来提取数据统计的特定值,例如 https://www.basketball-reference.com/players/l/lowryky01.html data-stat="pos" = PG 的输出。
from bs4 import BeautifulSoup
import requests
first = ()
first_slice = ()
last = ()
def askname():
global first
first = input(str("First Name of Player?"))
global last
last = input(str("Last Name of Player?"))
print("Confirmed, loading up " + first + " " + last)
# asks user for player name
askname()
first_slice_result = (first[:2])
last_slice_result = (last[:5])
print(first_slice_result)
print(last_slice_result)
# slices player's name so it can match the format bref uses
first_slice_resultA = str(first_slice_result)
last_slice_resultA = str(last_slice_result)
first_last_slice = last_slice_resultA + first_slice_resultA
lower = first_last_slice.lower() + "01"
start_letter = (last[:1])
lower_letter = (start_letter.lower())
# grabs the letter bref uses for organization
print(lower)
source = requests.get('https://www.basketball-reference.com/players/' + lower_letter + '/' + lower + '.html').text
soup = BeautifulSoup(source, 'lxml')
tbody = soup.find('tbody')
pergame = tbody.find(class_="full_table")
classrite = pergame.find(class_="right")
tr_body = tbody.find_all('tr')
print(pergame)
# seperates data-stat, apparently you can use .get to get obscure classes
for trb in tr_body:
print(trb.get('id'))
th = trb.find('th')
print(th.get_text())
print(th.get('data-stat'))
for td in trb.find_all('td'):
print(td.get_text())
print(td.get('data-stat'))
大约 4 个月前开始这个项目,我很难记住如何拆分和提取特定的数据统计信息。
嗯,据我所知,你基本上已经完成了你想要的。
从这一点开始,只需将您提取的信息组织到字典中,然后您就可以通过它们的键提取值。
for trb in tr_body:
print(trb.get('id'))
th = trb.find('th')
print(th.get_text())
print(th.get('data-stat'))
row = {}
for td in trb.find_all('td'):
row[td.get('data-stat')] = td.get_text()
print(row['pos'], row['team_id'], row['fg_pct'])
希望对您有所帮助。
到目前为止,代码从 https://www.basketball-reference.com 中拉出一个页面,并使用数据统计 class(???).[=15= 获取 tr_body 中的任何数据]
我需要一种方法来提取数据统计的特定值,例如 https://www.basketball-reference.com/players/l/lowryky01.html data-stat="pos" = PG 的输出。
from bs4 import BeautifulSoup
import requests
first = ()
first_slice = ()
last = ()
def askname():
global first
first = input(str("First Name of Player?"))
global last
last = input(str("Last Name of Player?"))
print("Confirmed, loading up " + first + " " + last)
# asks user for player name
askname()
first_slice_result = (first[:2])
last_slice_result = (last[:5])
print(first_slice_result)
print(last_slice_result)
# slices player's name so it can match the format bref uses
first_slice_resultA = str(first_slice_result)
last_slice_resultA = str(last_slice_result)
first_last_slice = last_slice_resultA + first_slice_resultA
lower = first_last_slice.lower() + "01"
start_letter = (last[:1])
lower_letter = (start_letter.lower())
# grabs the letter bref uses for organization
print(lower)
source = requests.get('https://www.basketball-reference.com/players/' + lower_letter + '/' + lower + '.html').text
soup = BeautifulSoup(source, 'lxml')
tbody = soup.find('tbody')
pergame = tbody.find(class_="full_table")
classrite = pergame.find(class_="right")
tr_body = tbody.find_all('tr')
print(pergame)
# seperates data-stat, apparently you can use .get to get obscure classes
for trb in tr_body:
print(trb.get('id'))
th = trb.find('th')
print(th.get_text())
print(th.get('data-stat'))
for td in trb.find_all('td'):
print(td.get_text())
print(td.get('data-stat'))
大约 4 个月前开始这个项目,我很难记住如何拆分和提取特定的数据统计信息。
嗯,据我所知,你基本上已经完成了你想要的。
从这一点开始,只需将您提取的信息组织到字典中,然后您就可以通过它们的键提取值。
for trb in tr_body:
print(trb.get('id'))
th = trb.find('th')
print(th.get_text())
print(th.get('data-stat'))
row = {}
for td in trb.find_all('td'):
row[td.get('data-stat')] = td.get_text()
print(row['pos'], row['team_id'], row['fg_pct'])
希望对您有所帮助。