AttributeError: 'ResultSet' object has no attribute 'find_all'

Question

哪里出错了？我想解析没有标签的文本。

from bs4 import BeautifulSoup       
import re
import urllib.request
f = urllib.request.urlopen("http://www.championat.com/football/news-2442480-orlov-zenit-obespokoen---pole-na-novom-stadione-mozhet-byt-nekachestvennym.html")

soup = BeautifulSoup(f, 'html.parser')

soup=soup.find_all('div', class_="text-decor article__contain")

invalid_tags = ['b', 'i', 'u', 'br', 'a']

for tag in invalid_tags: 

  for match in soup.find_all(tag):

        match.replaceWithChildren()

soup = ''.join(map(str, soup.contents))

print (soup)

错误：

Traceback (most recent call last):
  File "1.py", line 9, in <module>
    for match in soup.find_all(tag):
AttributeError: 'ResultSet' object has no attribute 'find_all'

Answer 1

soup=soup.find_all('div', class_="text-decor article__contain")

在这条线上 soup 变成了一个 ResultSet 实例 - 基本上是一个 Tag 实例列表 。而且，您将获得 'ResultSet' object has no attribute 'find_all'，因为此 ResultSet 实例没有 find_all() 方法。仅供参考，这个问题实际上在文档的 troubleshooting section 中有描述：

AttributeError: 'ResultSet' object has no attribute 'foo' - This usually happens because you expected find_all() to return a single tag or string. But find_all() returns a list of tags and strings–a ResultSet object. You need to iterate over the list and look at the .foo of each one. Or, if you really only want one result, you need to use find() instead of find_all().

你真的想要一个结果，因为页面上只有一篇文章：

soup = soup.find('div', class_="text-decor article__contain")

请注意，虽然不需要一个一个地查找标签，但您可以将标签名称列表直接传递给 find_all() - BeautifulSoup 在定位元素方面非常灵活：

article = soup.find('div', class_="text-decor article__contain")

invalid_tags = ['b', 'i', 'u', 'br', 'a']
for match in article.find_all(invalid_tags):
     match.unwrap()  # bs4 alternative for replaceWithChildren

AttributeError: 'ResultSet' object has no attribute 'find_all'

AttributeError: 'ResultSet' object has no attribute 'find_all'

python

resultset

beautifulsoup

findall