剥离列表 returns 空白值

Stripping a list returns blank value

嘿,我正在尝试从换行符中删除一个列表,但是我得到的输出是空白的。我究竟做错了什么?我正在使用 jupyter 运行 它

url = 'https://en.wikipedia.org/wiki/Microsoft_3D_Viewer'
paragraphs = []
titles = []
scraped_content = []
scraped_titles = []
scraped_list = []

response = requests.get(url, time.sleep(2)) 
soup2 = BeautifulSoup(response.content, "html.parser")
paragraphs = soup2.find_all('p')
lists = soup2.find_all('ul')
titles = soup2.find_all(re.compile('^h[1-4]$'))
               
for paragraph in paragraphs:
    paragraphs = [paragraph.text]
    paragraphs = paragraph.get_text()
    scraped_content.append(paragraphs)
        
for title in titles:
    titles = [title.text]
    titles = title.get_text()
    scraped_titles.append(titles)
                       
scraped_content = list(map(str.strip, scraped_content))
scraped_content

除了 requests.get.

的参数外,您的代码看起来可以正常工作
  • 删除 requests.get 的第二个参数,因为它是不必要的并且可能会导致问题。如果您打算添加 2 秒的超时,请改用 timeout=2as documented.
  • 确保您 运行 一个单元格中的所有代码,这样 none 的数据可能会损坏。
  • 如果您位于 运行 Jupyter 服务器的 proxy/firewall 后面,那么这可能会影响请求的结果。
  • 无需像 paragraphstitles 那样声明变量,然后立即重新分配它们。您可以使用列表理解更直接地获得结果。
from bs4 import BeautifulSoup
import requests
import time
import re
url = 'https://en.wikipedia.org/wiki/Microsoft_3D_Viewer'

response = requests.get(url) 
paragraphs = []
titles = []
scraped_content = []
scraped_titles = []
scraped_list = []
soup2 = BeautifulSoup(response.content, "html.parser")
paragraphs = soup2.find_all('p')
lists = soup2.find_all('ul')
titles = soup2.find_all(re.compile('^h[1-4]$'))

scraped_content = [paragraph.get_text() for paragraph in paragraphs]

scraped_titles = [title.get_text() for title in titles]

trimmed_content = [content.strip() for content in scraped_content]
trimmed_content

输出(被截断,只显示第一行):

['3D Viewer (formerly Mixed Reality Viewer and before that, View 3D)[2][3][4] is a 3D object viewer and Augmented Reality application that was first included in Windows 10 1703. It supports the .fbx, .3mf, .obj, and .stl  and many more file formats[5] listed in features section.',
...