在 iframe 中获取 PDF id="swGoogleDrive"

get PDFs in iframe id="swGoogleDrive"

如何获取在此 URL 的 iframe 中找到的 PDF?

(1) 下面的代码会报错。

import requests, re
from bs4 import BeautifulSoup

url = r'https://www.d88a.org/domain/102'
headers = {'User-Agent': 'C19SchoolsWebscrape'}

s = requests.Session()
r = s.get(url, headers=headers)

soup = BeautifulSoup(r.content, "lxml")
iframe_src = soup.select_one("swGoogleDrive").attrs["src"]
r = s.get(f"https:{iframe_src}")
print(r)
error: 'NoneType' object has no attribute 'attrs'

(2) 这也会引发错误。

response = requests.get(url, headers=headers)
t = re.search(b'(?<=artist":")(.*?)(?=")', response.content).group(0).decode("utf-8")
print(t)
error: 'NoneType' object has no attribute 'group'

我参考过的较早的线程: ,

要获取 PDF 的所有链接,您可以使用此示例:

import requests
from bs4 import BeautifulSoup


url = 'https://www.d88a.org/domain/102'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')
soup = BeautifulSoup(requests.get(soup.iframe['src']).content, 'html.parser')

for a in soup.select('a'):
    print(a['href'])

打印:

https://drive.google.com/file/d/1bCXyoE7FWWI9RIcDWosHrohYQY7Ryb13/view?usp=drive_web
https://drive.google.com/file/d/1SlR-71M-jCMF-AO4ChdSbywolIF9yL1h/view?usp=drive_web
https://drive.google.com/file/d/1zbrt5Mnt0fZxjeD7DRYvfP6cskYKig27/view?usp=drive_web