Python 使用 BeautifulSoup 抓取到下一页

Question

这是我的抓取代码：

import requests
from bs4 import BeautifulSoup as soup
def get_emails(_links:list):
for i in range(len(_links)):
 new_d = soup(requests.get(_links[i]).text, 'html.parser').find_all('a', {'class':'my_modal_open'})
 if new_d:
   yield new_d[-1]['title']

start=20
while True:
d = soup(requests.get('http://www.schulliste.eu/type/gymnasien/?bundesland=&start=20').text, 'html.parser')

results = [i['href'] for i in d.find_all('a')][52:-9]
results = [link for link in results if link.startswith('http://')]
print(list(get_emails(results)))

next_page=soup.find('div', {'class': 'paging'}, 'weiter')

if next_page:

    d=next_page.get('href')
    start+=20
else:
    break

这就是我得到的错误： AttributeError: 'str' 对象没有属性 'find_all'

当您按下按钮 "weiter"（下一页）时，urlending 从“...start=20”变为 "start=40"。这是 20 秒的步骤，因为每个站点有 20 个结果。有谁知道错误的原因吗？

Answer 1

您将 'soup' 放入名为 'd' 的变量中。

所以替换下面一行：

next_page=soup.find('div', {'class': 'paging'}, 'weiter')

有了这个：

next_page = d.find('div', {'class': 'paging'}, 'weiter')

Python 使用 BeautifulSoup 抓取到下一页

Python scraping go to next page using BeautifulSoup

python

beautifulsoup

next

attributeerror

web-scraping