为什么我不能使用 beautifulsoup 提取同一网站的其他页面
Why can't I extract the other pages of the same website using beautifulsoup
我编写此代码是为了从该网站提取多页数据(基础 URL - “https://www.goodreads.com/shelf/show/fiction”)。
import requests
from bs4 import BeautifulSoup
import pandas as pd
page = 1
book_title = []
while page != 5:
url = 'https://www.goodreads.com/shelf/show/fiction?page={page}'
response = requests.get(url)
page_content = response.text
doc = BeautifulSoup(page_content, 'html.parser')
a_tags = doc.find_all('a', {'class': 'bookTitle'})
for tag in a_tags:
book_title.append(tag.text)
page = page + 1
但它只显示前 50 本书的数据。如何使用 beautifulsoup 提取所有小说书名来提取所有页面?
您可以从您的基础库url中对书籍的小说类别进行分页,您需要在搜索框中输入fiction
关键字并点击搜索按钮,然后您会得到这个url :https://www.goodreads.com/search?q=fiction&qid=ydDLZMCwDJ 从这里您必须收集数据并制作下一页。
import requests
from bs4 import BeautifulSoup
import pandas as pd
book_title = []
url = 'https://www.goodreads.com/search?page={page}&q=fiction&qid=ydDLZMCwDJ&tab=books'
for page in range(1,11):
response = requests.get(url.format(page=page))
page_content = response.text
doc = BeautifulSoup(page_content, 'html.parser')
a_tags = doc.find_all('a', {'class': 'bookTitle'})
for tag in a_tags:
book_title.append(tag.get_text(strip=True))
df = pd.DataFrame(book_title,columns=['Title'])
print(df)
输出:
Title
0 Trigger Warning: Short Fictions and Disturbances
1 You Are Not So Smart: Why You Have Too Many Fr...
2 Smoke and Mirrors: Short Fiction and Illusions
3 Fragile Things: Short Fictions and Wonders
4 Collected Fictions
.. ...
195 The Science Fiction Hall of Fame, Volume One, ...
196 The Art of Fiction: Notes on Craft for Young W...
197 Invisible Planets: Contemporary Chinese Scienc...
198 How Fiction Works
199 Monster, She Wrote: The Women Who Pioneered Ho...
[200 rows x 1 columns]
我编写此代码是为了从该网站提取多页数据(基础 URL - “https://www.goodreads.com/shelf/show/fiction”)。
import requests
from bs4 import BeautifulSoup
import pandas as pd
page = 1
book_title = []
while page != 5:
url = 'https://www.goodreads.com/shelf/show/fiction?page={page}'
response = requests.get(url)
page_content = response.text
doc = BeautifulSoup(page_content, 'html.parser')
a_tags = doc.find_all('a', {'class': 'bookTitle'})
for tag in a_tags:
book_title.append(tag.text)
page = page + 1
但它只显示前 50 本书的数据。如何使用 beautifulsoup 提取所有小说书名来提取所有页面?
您可以从您的基础库url中对书籍的小说类别进行分页,您需要在搜索框中输入fiction
关键字并点击搜索按钮,然后您会得到这个url :https://www.goodreads.com/search?q=fiction&qid=ydDLZMCwDJ 从这里您必须收集数据并制作下一页。
import requests
from bs4 import BeautifulSoup
import pandas as pd
book_title = []
url = 'https://www.goodreads.com/search?page={page}&q=fiction&qid=ydDLZMCwDJ&tab=books'
for page in range(1,11):
response = requests.get(url.format(page=page))
page_content = response.text
doc = BeautifulSoup(page_content, 'html.parser')
a_tags = doc.find_all('a', {'class': 'bookTitle'})
for tag in a_tags:
book_title.append(tag.get_text(strip=True))
df = pd.DataFrame(book_title,columns=['Title'])
print(df)
输出:
Title
0 Trigger Warning: Short Fictions and Disturbances
1 You Are Not So Smart: Why You Have Too Many Fr...
2 Smoke and Mirrors: Short Fiction and Illusions
3 Fragile Things: Short Fictions and Wonders
4 Collected Fictions
.. ...
195 The Science Fiction Hall of Fame, Volume One, ...
196 The Art of Fiction: Notes on Craft for Young W...
197 Invisible Planets: Contemporary Chinese Scienc...
198 How Fiction Works
199 Monster, She Wrote: The Women Who Pioneered Ho...
[200 rows x 1 columns]