如何抓取投手的名字和球队？

Question

我是 scraping/coding 的新手，如果可能需要一些帮助。

  from bs4 import BeautifulSoup
  import requests
  import pandas as pd

  page_link ='https://www.baseball-reference.com/previews/index.shtml'
  page_response = requests.get(page_link, timeout=5)
  soup = BeautifulSoup(page_response.content, "html.parser")

我需要帮助找到合适的方法来提取投手的名字和球队。

（仅示例：）

  player_name = [i.text for i in soup.find_all('td', {'href': 'example-name'})]

  team = [i.text for i in soup.find_all('td', {'href': 'example-team'})]

这是我导出到 excel:

的地方

  my_dict = dict(zip(player_name, team))

  df = pd.DataFrame(pd.Series(my_dict))

  writer = pd.ExcelWriter('pitching_webscrape.xlsx')
  df.to_excel(writer,'Sheet1')
  writer.save()

我想将投手的名字和球队导入 excel。在此先感谢您的帮助！如果我可以改进我的问题或添加更多详细信息，请告诉我。

这是我目前的代码：

  from bs4 import BeautifulSoup
  import requests
  import pandas as pd
  page_link ='https://www.baseball-reference.com/previews/index.shtml'
  page_response = requests.get(page_link, timeout=5)
  soup = BeautifulSoup(page_response.content, "html.parser")

我的第一个代码：

  t = soup.find_all('td')
  print(t)

我的第一个输出：

[Blue Jays (60-70) , , Preview , Orioles (37-94) , 7:05PM , TOR, Sam Gaviglio
(#43, 28, RHP, 3-6, 4.94), BAL, David Hess
(#41, 24, RHP, 2-8, 5.50), White Sox (51-79),

我的第二个代码：

  t = soup.find_all('td')
  for a in t:
      print(a.text)

我的第二个输出：

蓝鸟队 (60-70)

预览

金莺 (37-94)

7:05PM

托尔 Sam Gaviglio（#43, 28, RHP, 3-6, 4.94） BAL 大卫赫斯（#41, 24, RHP, 2-8, 5.50）白袜队 (51-79)

我越来越接近了，但是，我只想要球员的名字和球队的名字。（即 TOR、Sam Gaviglio）。我也想将其导入 excel。谢谢！ =)

Answer 1

如果您只想要一个 list 的球员和球队，那么这应该足够了：

import re
players_and_teams = []

for i in soup.find_all('td'):
    if i.find_all('a'):
        for link in i.find_all('a'):
            if not re.findall(r'Preview',link.text):
                players_and_teams.append(link.text)

如何抓取投手的名字和球队？

How to scrape the pitcher's name and team?

python

findall

web-scraping

pandas

python-requests