美丽汤中的 Scape 问题,NoneType' 对象没有属性 'find_all

Scape issue in beautiful soup, NoneType' object has no attribute 'find_all

正在尝试执行此代码以抓取下面提到的特定网站/RSS 提要 不断获得:

回溯(最近调用最后):

文件 "C:\Users\Jeanne\Desktop\PYPDIT\pyscape.py",第 28 行,在 成绩单 = [url_to_transcript(u) for u in urls]

文件 "C:\Users\Jeanne\Desktop\PYPDIT\pyscape.py",第 28 行,在 成绩单 = [url_to_transcript(u) for u in urls]

文件 "C:\Users\Jeanne\Desktop\PYPDIT\pyscape.py",第 17 行,在 url_to_transcript 中 text = [p.text for p in soup.find(class_="itemcontent").find_all('p')]

AttributeError: 'NoneType' 对象没有属性 'find_all'

请指教

import requests
from bs4 import BeautifulSoup
import pickle

def url_to_transcript(url):

page = requests.get(url).text
soup = BeautifulSoup(page, "lxml")
text = [p.text for p in soup.find(class_="itemcontent").find_all('p')]
print(url)
return text

范围内的转录本 URL

urls = ['http://feeds.nos.nl/nosnieuwstech',
        'http://feeds.nos.nl/nosnieuwsalgemeen']

transcripts = [url_to_transcript(u) for u in urls]

返回的html与您在页面上看到的不一样。您可以使用以下内容:

import requests
from bs4 import BeautifulSoup
 # import pickle

urls = ['http://feeds.nos.nl/nosnieuwstech','http://feeds.nos.nl/nosnieuwsalgemeen']

with requests.Session() as s:
    for url in urls:
        page = s.get(url).text
        soup = BeautifulSoup(page, "lxml")
        print(url)
        print([[i.text for i in desc.select('p')] for desc in soup.select('description')[1:]])
        print('--'*100)