我收到一个空数据框,试图通过网络抓取 html 代码。为什么?
I'm getting an empty dataframe trying to web scrape html code. Why?
尝试使用 Python 3.x 和 pandas 从 Basketball-Reference 抓取工资数据。我没有收到任何错误消息,但没有输出。我想要 table 的第二列和第四列:'Player' 和薪水“2019-20”。我做错了什么?
这是我目前拥有的:
# URL page we will scraping
salaries_url = 'https://www.basketball-reference.com/contracts/players.html'
salaries_response = requests.get(salaries_url)
page = salaries_response.text
# this is the HTML from the given URL
soup = BeautifulSoup(html)
#This takes the player salaries data, and creates a list of a lists, where a list is all the values of a player
salaries = []
for x in soup.find_all('tr')[2:]:
tds_salaries = x.find_all('td')
name_s = tds_salaries[0].text
salary = tds_salaries[2].text
salaries.append([name_s, salary[1:]])
#create a salary pandas dataframe
salaries_df = pd.DataFrame(salaries, columns=['name', 'salary'])
salaries_df.head()
它在这里工作得很好。我所做的只是在 for 循环中尝试跳过 table headers.
代码
salaries_url = 'https://www.basketball-reference.com/contracts/players.html'
salaries_response = requests.get(salaries_url)
page = salaries_response.text
soup = BeautifulSoup(page)
salaries = []
for x in soup.find_all('tr')[2:]:
try:
tds_salaries = x.find_all('td')
name_s = tds_salaries[0].text
salary = tds_salaries[2].text
salaries.append([name_s, salary[1:]])
except IndexError:
print('This is a header!')
salaries_df = pd.DataFrame(salaries, columns=['name', 'salary'])
print(salaries_df)
输出
name salary
0 Stephen Curry 40,231,758
1 Russell Westbrook 38,506,482
2 Chris Paul 38,506,482
3 John Wall 38,199,000
4 James Harden 38,199,000
.. ... ...
570 Hollis Thompson 50,000
571 Tyler Ulis 50,000
572 Demetrius Jackson 18,312
573 Jordan Caroline 6,000
574 Anthony Bennett 6,000
[575 rows x 2 columns]
尝试使用 Python 3.x 和 pandas 从 Basketball-Reference 抓取工资数据。我没有收到任何错误消息,但没有输出。我想要 table 的第二列和第四列:'Player' 和薪水“2019-20”。我做错了什么?
这是我目前拥有的:
# URL page we will scraping
salaries_url = 'https://www.basketball-reference.com/contracts/players.html'
salaries_response = requests.get(salaries_url)
page = salaries_response.text
# this is the HTML from the given URL
soup = BeautifulSoup(html)
#This takes the player salaries data, and creates a list of a lists, where a list is all the values of a player
salaries = []
for x in soup.find_all('tr')[2:]:
tds_salaries = x.find_all('td')
name_s = tds_salaries[0].text
salary = tds_salaries[2].text
salaries.append([name_s, salary[1:]])
#create a salary pandas dataframe
salaries_df = pd.DataFrame(salaries, columns=['name', 'salary'])
salaries_df.head()
它在这里工作得很好。我所做的只是在 for 循环中尝试跳过 table headers.
代码
salaries_url = 'https://www.basketball-reference.com/contracts/players.html'
salaries_response = requests.get(salaries_url)
page = salaries_response.text
soup = BeautifulSoup(page)
salaries = []
for x in soup.find_all('tr')[2:]:
try:
tds_salaries = x.find_all('td')
name_s = tds_salaries[0].text
salary = tds_salaries[2].text
salaries.append([name_s, salary[1:]])
except IndexError:
print('This is a header!')
salaries_df = pd.DataFrame(salaries, columns=['name', 'salary'])
print(salaries_df)
输出
name salary
0 Stephen Curry 40,231,758
1 Russell Westbrook 38,506,482
2 Chris Paul 38,506,482
3 John Wall 38,199,000
4 James Harden 38,199,000
.. ... ...
570 Hollis Thompson 50,000
571 Tyler Ulis 50,000
572 Demetrius Jackson 18,312
573 Jordan Caroline 6,000
574 Anthony Bennett 6,000
[575 rows x 2 columns]