beautiful soup 4 多数据获取问题。这让我很困惑
beautiful soup 4 issue in mulitple data fetching. it is confusing me
当我获取一个数据时,它工作正常,正如我在下面的代码中提到的那样。 每当我在类似标记中找到所有数据时(示例 - {'class': 'doctor-name'})它显示输出为 none.
单标签输出
from bs4 import BeautifulSoup
s = """
<a class="doctor-name" itemprop="name" href="/doctors/gastroenterologists/dr-isaac-raijman-md-1689679557">Dr. Isaac Raijman, MD</a>
"""
soup = BeautifulSoup(s, 'html.parser')
print(soup.find('a ', {'class': 'doctor-name'}).text)
print(soup.find('a ', {'itemprop': 'name'}).text)
Output -
[Dr. Isaac Raijman, MD,
Dr. Isaac Raijman, MD]
查找所有使用相似标记但输出显示为 none-
import requests, bs4
from bs4 import BeautifulSoup
url = "https://soandso.org/doctors/gastroenterologists"
page = requests.get(url)
page
page.status_code
page.content
soup = BeautifulSoup(page.content, 'html.parser')
soup
print(soup.prettify())
lists = soup.find_all('section', attrs={'class': 'search-page find-a-doctor'})
for list in lists:
doctor = list.find('a', attrs={'class': 'doctor-name'})#.text
info = [doctor]
print(info)
Output - none
请帮我解决这个问题。将您的理解作为代码分享,#hastags 定义也可以。
该信息由浏览器建立,不会在 HTML 中返回。一种更简单的方法是从 JSON API 请求它,如下所示:
import requests
headers = {'Authorization' : 'eyJhbGciOiJodHRwOi8vd3d3LnczLm9yZy8yMDAxLzA0L3htbGRzaWctbW9yZSNobWFjLXNoYTI1NiIsInR5cCI6IkpXVCJ9.eyJodHRwOi8vc2NoZW1hcy54bWxzb2FwLm9yZy93cy8yMDA1LzA1L2lkZW50aXR5L2NsYWltcy9uYW1lIjoiYWRtaW4iLCJleHAiOjIxMjcwNDQ1MTcsImlzcyI6Imh0dHBzOi8vZGV2ZWxvcGVyLmhlYWx0aHBvc3QuY29tIiwiYXVkIjoiaHR0cHM6Ly9kZXZlbG9wZXIuaGVhbHRocG9zdC5jb20ifQ.zNvR3WpI17CCMC7rIrHQCrnJg_6qGM21BvTP_ed_Hj8'}
json_post = {"query":"","start":0,"rows":10,"selectedFilters":{"availability":[],"clinicalInterest":[],"distance":[20],"gender":["Both"],"hasOnlineScheduling":False,"insurance":[],"isMHMG":False,"language":[],"locationType":[],"lonlat":[-95.36,29.76],"onlineScheduling":["Any"],"specialty":["Gastroenterology"]}}
req = requests.post("https://api.memorialhermann.org/api/doctorsearch", json=json_post, headers=headers)
data = req.json()
for doctor in data['docs']:
print(f"{doctor['Name']:30} {doctor['PrimarySpecialty']:20} {doctor['PrimaryFacility']}")
给你:
Dr. Isaac Raijman, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Gabriel Lee, MD Gastroenterology Memorial Hermann Southeast Hospital
Dr. Dang Nguyen, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Harshinie Amaratunge, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Tanima Jana, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Tugrul Purnak, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Dimpal Bhakta, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Dharmendra Verma, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Jennifer Shroff, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Brooks Cash, MD Gastroenterology Memorial Hermann Texas Medical Center
当我获取一个数据时,它工作正常,正如我在下面的代码中提到的那样。 每当我在类似标记中找到所有数据时(示例 - {'class': 'doctor-name'})它显示输出为 none.
单标签输出
from bs4 import BeautifulSoup
s = """
<a class="doctor-name" itemprop="name" href="/doctors/gastroenterologists/dr-isaac-raijman-md-1689679557">Dr. Isaac Raijman, MD</a>
"""
soup = BeautifulSoup(s, 'html.parser')
print(soup.find('a ', {'class': 'doctor-name'}).text)
print(soup.find('a ', {'itemprop': 'name'}).text)
Output - [Dr. Isaac Raijman, MD, Dr. Isaac Raijman, MD]
查找所有使用相似标记但输出显示为 none-
import requests, bs4
from bs4 import BeautifulSoup
url = "https://soandso.org/doctors/gastroenterologists"
page = requests.get(url)
page
page.status_code
page.content
soup = BeautifulSoup(page.content, 'html.parser')
soup
print(soup.prettify())
lists = soup.find_all('section', attrs={'class': 'search-page find-a-doctor'})
for list in lists:
doctor = list.find('a', attrs={'class': 'doctor-name'})#.text
info = [doctor]
print(info)
Output - none
请帮我解决这个问题。将您的理解作为代码分享,#hastags 定义也可以。
该信息由浏览器建立,不会在 HTML 中返回。一种更简单的方法是从 JSON API 请求它,如下所示:
import requests
headers = {'Authorization' : 'eyJhbGciOiJodHRwOi8vd3d3LnczLm9yZy8yMDAxLzA0L3htbGRzaWctbW9yZSNobWFjLXNoYTI1NiIsInR5cCI6IkpXVCJ9.eyJodHRwOi8vc2NoZW1hcy54bWxzb2FwLm9yZy93cy8yMDA1LzA1L2lkZW50aXR5L2NsYWltcy9uYW1lIjoiYWRtaW4iLCJleHAiOjIxMjcwNDQ1MTcsImlzcyI6Imh0dHBzOi8vZGV2ZWxvcGVyLmhlYWx0aHBvc3QuY29tIiwiYXVkIjoiaHR0cHM6Ly9kZXZlbG9wZXIuaGVhbHRocG9zdC5jb20ifQ.zNvR3WpI17CCMC7rIrHQCrnJg_6qGM21BvTP_ed_Hj8'}
json_post = {"query":"","start":0,"rows":10,"selectedFilters":{"availability":[],"clinicalInterest":[],"distance":[20],"gender":["Both"],"hasOnlineScheduling":False,"insurance":[],"isMHMG":False,"language":[],"locationType":[],"lonlat":[-95.36,29.76],"onlineScheduling":["Any"],"specialty":["Gastroenterology"]}}
req = requests.post("https://api.memorialhermann.org/api/doctorsearch", json=json_post, headers=headers)
data = req.json()
for doctor in data['docs']:
print(f"{doctor['Name']:30} {doctor['PrimarySpecialty']:20} {doctor['PrimaryFacility']}")
给你:
Dr. Isaac Raijman, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Gabriel Lee, MD Gastroenterology Memorial Hermann Southeast Hospital
Dr. Dang Nguyen, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Harshinie Amaratunge, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Tanima Jana, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Tugrul Purnak, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Dimpal Bhakta, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Dharmendra Verma, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Jennifer Shroff, MD Gastroenterology Memorial Hermann Texas Medical Center
Dr. Brooks Cash, MD Gastroenterology Memorial Hermann Texas Medical Center