为什么我不能使用 beautifulsoup 提取同一网站的其他页面

Question

我编写此代码是为了从该网站提取多页数据（基础 URL - “https://www.goodreads.com/shelf/show/fiction”）。

import requests
from bs4 import BeautifulSoup
import pandas as pd

page = 1
book_title = []

while page != 5:
      url = 'https://www.goodreads.com/shelf/show/fiction?page={page}'
      response = requests.get(url)
      page_content = response.text
      doc = BeautifulSoup(page_content, 'html.parser')

      a_tags = doc.find_all('a', {'class': 'bookTitle'})
      for tag in a_tags:
        book_title.append(tag.text)

      page = page + 1

但它只显示前 50 本书的数据。如何使用 beautifulsoup 提取所有小说书名来提取所有页面？

Answer 1

您可以从您的基础库url中对书籍的小说类别进行分页，您需要在搜索框中输入fiction关键字并点击搜索按钮，然后您会得到这个url :https://www.goodreads.com/search?q=fiction&qid=ydDLZMCwDJ 从这里您必须收集数据并制作下一页。

import requests
from bs4 import BeautifulSoup
import pandas as pd

book_title = []

url = 'https://www.goodreads.com/search?page={page}&q=fiction&qid=ydDLZMCwDJ&tab=books'
for page in range(1,11):
    response = requests.get(url.format(page=page))
    page_content = response.text
    doc = BeautifulSoup(page_content, 'html.parser')

    a_tags = doc.find_all('a', {'class': 'bookTitle'})
    for tag in a_tags:
        book_title.append(tag.get_text(strip=True))

   
df = pd.DataFrame(book_title,columns=['Title'])
print(df)

输出：

                 Title
0     Trigger Warning: Short Fictions and Disturbances
1    You Are Not So Smart: Why You Have Too Many Fr...
2       Smoke and Mirrors: Short Fiction and Illusions
3           Fragile Things: Short Fictions and Wonders
4                                   Collected Fictions
..                                                 ...
195  The Science Fiction Hall of Fame, Volume One, ...
196  The Art of Fiction: Notes on Craft for Young W...
197  Invisible Planets: Contemporary Chinese Scienc...
198                                  How Fiction Works
199  Monster, She Wrote: The Women Who Pioneered Ho...

[200 rows x 1 columns]

为什么我不能使用 beautifulsoup 提取同一网站的其他页面

Why can't I extract the other pages of the same website using beautifulsoup

python

beautifulsoup

web-scraping