new() 缺少 1 个必需的位置参数，具体取决于 url 已删除

Question

我有一个奇怪的错误，我会尽量简化我的问题。我有一个简单的函数，它用漂亮的汤和 returns 一个列表来废弃 url。然后，我 pickle 文件中的列表，所以我 setrecursionlimit(10000) 以避免 RecursionError。在那之前，一切都很好。

但是当我尝试解开我的列表时，出现了这个错误：

Traceback (most recent call last):
  File ".\scrap_index.py", line 86, in <module>
    data_file = pickle.load(data)
TypeError: __new__() missing 1 required positional argument: 'name'

有我的功能：

import urllib.request
from bs4 import BeautifulSoup

def scrap_function(url):
    page = urllib.request.urlopen(url)
    soup = BeautifulSoup(page, "html5lib")   

    return [soup]

为了测试，我尝试了不同的 url。有了那个url，一切都很好：

url_ok = 'https://www.boursorama.com/bourse/'

但是对于那个，我有 TypeError:

url_not_ok = 'https://www.boursorama.com/bourse/actions'

以及测试代码：

import pickle
import sys

sys.setrecursionlimit(10000)

scrap_list = scrap_function(url_not_ok)

with open('test_saving.pkl', 'wb') as data:
    pickle.dump(scrap_list, data, protocol=2)

with open('test_saving.pkl', 'rb') as data:
    data_file = pickle.load(data)

print(data_file)

Answer 1

This 州

If some class objects have extra arguments in the new constructor , pickle fail to serialize it.

这可能会在 beautifulsoap 中导致问题 here：

class NavigableString(unicode, PageElement):
    def __new__(cls, value):

This answer 表示相同。

作为一种解决方案，不要存储整个对象，而可能只存储页面的源代码，如前所述here。

new() 缺少 1 个必需的位置参数，具体取决于 url 已删除

new() missing 1 required positional argument depending of url scraped

python

beautifulsoup

pickle

__new__() 缺少 1 个必需的位置参数，具体取决于 url 已删除

__new__() missing 1 required positional argument depending of url scraped

python

beautifulsoup

pickle

new() 缺少 1 个必需的位置参数，具体取决于 url 已删除

new() missing 1 required positional argument depending of url scraped