你如何为 BeautifulSoup 循环 URL 输出?
How do you loop the URL output for BeautifulSoup?
所以我想提取每个数据块,中间有一个 space,现在我有循环设置,但是当我去打印时它只提取第二个播放器配置文件。知道我该如何解决这个问题吗?
如果它工作正常,输出将是:
Greg Oden C #20
Born: Jan 22, 1988 (33 years old)
Birthplace/Hometown: Buffalo, New York
Nationality: United States
Height: 7-0 (213cm) Weight: 273 (124kg)
Website: http://www.gregoden52.com/
Current NBA Status: Unrestricted Free Agent
Agent: Bill Duffy
Draft Entry: 2007 NBA Draft
Early Entry Info: 2007 Early Entrant
Drafted: Round 1, Pick 1, Portland Trail Blazers
Pre-Draft Team: Ohio State (Fr)
High School: Lawrence North High School [Indianapolis, Indiana]
AAU Team: Spiece Indy Heat
Carl Landry F
Current Team: N/A
Born: Sep 19, 1983 (37 years old)
Birthplace/Hometown: Milwaukee, Wisconsin
Nationality: United States
Height: 6-9 (206cm) Weight: 248 (112kg)
Hand: Right
Website: https://carllandry.com/
@CarlLandry
Current NBA Status: Unrestricted Free Agent
Agent: Mark Bartelstein, Reggie Brown
Draft Entry: 2007 NBA Draft
Drafted: Round 2, Pick 1, Seattle SuperSonics
Draft Rights Trade: SEA to HOU, Jun 28, 2007
Pre-Draft Team: Purdue (Sr)
High School: Vincent High School [Milwaukee, Wisconsin]
代码如下:
import csv ;import requests
from bs4 import BeautifulSoup
import csv
import re
url_list = ['https://basketball.realgm.com/player/player/Summary/1',
'https://basketball.realgm.com/player/player/Summary/2']
for url in url_list:
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
player = soup.find_all('div', class_= 'wrapper clearfix container')[0]
playerprofile = re.sub(r'\n\s*\n', r'\n', player.get_text().strip(), flags=re.M)
print(playerprofile)
import csv
import requests
from bs4 import BeautifulSoup
import csv
import re
url_list = ['https://basketball.realgm.com/player/player/Summary/1',
'https://basketball.realgm.com/player/player/Summary/2']
for url in url_list:
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
player = soup.find_all('div', class_='wrapper clearfix container')[0]
playerprofile = re.sub(
r'\n\s*\n', r'\n', player.get_text().strip(), flags=re.M)
print(playerprofile + "\n")
此代码如您期望的输出所示工作,您的代码中播放器的解析和打印似乎发生在循环完成后。它应该在循环的每次迭代中完成,因此您可以将它缩进到循环中。
所以我想提取每个数据块,中间有一个 space,现在我有循环设置,但是当我去打印时它只提取第二个播放器配置文件。知道我该如何解决这个问题吗?
如果它工作正常,输出将是:
Greg Oden C #20
Born: Jan 22, 1988 (33 years old)
Birthplace/Hometown: Buffalo, New York
Nationality: United States
Height: 7-0 (213cm) Weight: 273 (124kg)
Website: http://www.gregoden52.com/
Current NBA Status: Unrestricted Free Agent
Agent: Bill Duffy
Draft Entry: 2007 NBA Draft
Early Entry Info: 2007 Early Entrant
Drafted: Round 1, Pick 1, Portland Trail Blazers
Pre-Draft Team: Ohio State (Fr)
High School: Lawrence North High School [Indianapolis, Indiana]
AAU Team: Spiece Indy Heat
Carl Landry F
Current Team: N/A
Born: Sep 19, 1983 (37 years old)
Birthplace/Hometown: Milwaukee, Wisconsin
Nationality: United States
Height: 6-9 (206cm) Weight: 248 (112kg)
Hand: Right
Website: https://carllandry.com/
@CarlLandry
Current NBA Status: Unrestricted Free Agent
Agent: Mark Bartelstein, Reggie Brown
Draft Entry: 2007 NBA Draft
Drafted: Round 2, Pick 1, Seattle SuperSonics
Draft Rights Trade: SEA to HOU, Jun 28, 2007
Pre-Draft Team: Purdue (Sr)
High School: Vincent High School [Milwaukee, Wisconsin]
代码如下:
import csv ;import requests
from bs4 import BeautifulSoup
import csv
import re
url_list = ['https://basketball.realgm.com/player/player/Summary/1',
'https://basketball.realgm.com/player/player/Summary/2']
for url in url_list:
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
player = soup.find_all('div', class_= 'wrapper clearfix container')[0]
playerprofile = re.sub(r'\n\s*\n', r'\n', player.get_text().strip(), flags=re.M)
print(playerprofile)
import csv
import requests
from bs4 import BeautifulSoup
import csv
import re
url_list = ['https://basketball.realgm.com/player/player/Summary/1',
'https://basketball.realgm.com/player/player/Summary/2']
for url in url_list:
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
player = soup.find_all('div', class_='wrapper clearfix container')[0]
playerprofile = re.sub(
r'\n\s*\n', r'\n', player.get_text().strip(), flags=re.M)
print(playerprofile + "\n")
此代码如您期望的输出所示工作,您的代码中播放器的解析和打印似乎发生在循环完成后。它应该在循环的每次迭代中完成,因此您可以将它缩进到循环中。