用深拷贝复制 class 会以某种方式导致无限递归
Duplicating class with deep-copy causes infinite recursion somehow
我正在尝试简单地在 python 中制作我的 URL
class 的独立副本,这样我就可以在不影响原件的情况下修改副本。
以下是我的问题代码的精简可执行版本:
from bs4 import BeautifulSoup
from copy import deepcopy
from urllib import request
url_dict = {}
class URL:
def __init__(self, url, depth, log_entry=None, soup=None):
self.url = url
self.depth = depth # Current, not total, depth level
self.log_entry = log_entry
self.soup = soup
self.indent = ' ' * (5 - self.depth)
self.log_url = 'test.com'
# Blank squad
self.parsed_list = []
def get_log_output(self):
return self.indent + self.log_url
def get_print_output(self):
if self.log_entry is not None:
return self.indent + self.log_url + ' | ' + self.log_entry
return self.indent + self.log_url
def set_soup(self):
if self.soup is None:
code = ''
try: # Read and store code for parsing
code = request.urlopen(self.url).read()
except Exception as exception:
print(str(exception))
self.soup = BeautifulSoup(code, features='lxml')
def crawl(current_url, current_depth):
current_check_link = current_url
has_crawled = current_check_link in url_dict
if current_depth > 0 and not has_crawled:
current_crawl_job = URL(current_url, current_depth)
current_crawl_job.set_soup()
url_dict[current_check_link] = deepcopy(current_crawl_job)
for link in ['http://xts.site.nfoservers.com']: # Crawl for each URL the user inputs
crawl(link, 3)
产生的异常:
Traceback (most recent call last):
File "/home/[CENSORED]/.vscode-oss/extensions/ms-python.python-2020.10.332292344/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_trace_dispatch_regular.py", line 374, in __call__
if cache_skips.get(frame_cache_key) == 1:
RecursionError: maximum recursion depth exceeded in comparison
Fatal Python error: _Py_CheckRecursiveCall: Cannot recover from stack overflow.
Python runtime state: initialized
我无法判断这个特定的无限递归发生在哪里。我已经阅读了诸如此类的问题
RecursionError when python copy.deepcopy but I'm not even sure that it applies to my use-case. If it does apply, then my brain just can't seem to understand it as I'm under the impression deepcopy()
should just take each self
variable value and duplicate it to the new class. If that's not the case, then I would love some enlightenment. All the articles in my search results are similar to this 对我的情况不是很有帮助。
请注意,我并不是简单地寻找我的代码的修改片段来解决这个问题。我主要想了解这里到底发生了什么,这样我既可以现在修复它,也可以在将来避免它。
编辑:这似乎是 deepcopy
和 set_soup()
方法之间的冲突。如果我更换
url_dict[current_check_link] = deepcopy(current_crawl_job)
和
url_dict[current_check_link] = current_crawl_job
上面的代码片段运行没有错误。同样,如果我完全删除 current_crawl_job.set_soup()
,我也不会收到任何错误。我就是不能两者兼得。
Edit2:我可以删除任何一个
try: # Read and store code for parsing
code = request.urlopen(self.url).read()
except Exception as exception:
print(str(exception))
或
self.soup = BeautifulSoup(code, features='lxml')
错误再次消失,程序运行正常。
This Article 表示,
Deep copy is a process in which the copying process occurs recursively. It means first constructing a new collection object and then recursively populating it with copies of the child objects found in the original.
所以我的理解是,
A = [1,2,[3,4],5]
B = deepcopy(A) #This will make 1 level deep recursive call to copy the inner list
C = [1,[2,[3,[4,[5,[6]]]]]]
D = deepcopy(C) #This will make 5 levels deep recursive call (recursively copying inner lists)
我最好的猜测
Python 有最大 递归深度限制 以防止 堆栈溢出 .
您可以使用
找到最大递归深度限制
import sys
print(sys.getrecursionlimit())
在您的例子中,您正在尝试深度复制 Class 对象。 class object 的递归调用必须 超过 最大递归限制.
可能的解决方案
您可以告诉 python 使用
设置更高的最大递归限制
limit = 2000
sys.setrecursionlimit(limit)
或者您可能会随着程序的进行而增加 limit。有关更多信息,请访问 this link。
我不是 100% 确定增加限制会完成这项工作,但我很确定某些 子对象 你的 class 对象 有太多 内部对象 以至于 deepcopy 变得疯狂!
编辑
有人告诉我下面一行是罪魁祸首,
self.soup = BeautifulSoup(code, features='lxml')
当您执行 current_crawl_job.set_soup()
时,您的 class 的 None
soup 对象将替换为复杂的 BeautifulSoup目的。这给 deepcopy 方法带来了麻烦。
建议
在set_soup方法中,将self.soup
属性保留为原始 html 字符串并在您尝试修改它时将其转换为 BeautifulSoup 对象。这将解决您的深层复制问题。
我正在尝试简单地在 python 中制作我的 URL
class 的独立副本,这样我就可以在不影响原件的情况下修改副本。
以下是我的问题代码的精简可执行版本:
from bs4 import BeautifulSoup
from copy import deepcopy
from urllib import request
url_dict = {}
class URL:
def __init__(self, url, depth, log_entry=None, soup=None):
self.url = url
self.depth = depth # Current, not total, depth level
self.log_entry = log_entry
self.soup = soup
self.indent = ' ' * (5 - self.depth)
self.log_url = 'test.com'
# Blank squad
self.parsed_list = []
def get_log_output(self):
return self.indent + self.log_url
def get_print_output(self):
if self.log_entry is not None:
return self.indent + self.log_url + ' | ' + self.log_entry
return self.indent + self.log_url
def set_soup(self):
if self.soup is None:
code = ''
try: # Read and store code for parsing
code = request.urlopen(self.url).read()
except Exception as exception:
print(str(exception))
self.soup = BeautifulSoup(code, features='lxml')
def crawl(current_url, current_depth):
current_check_link = current_url
has_crawled = current_check_link in url_dict
if current_depth > 0 and not has_crawled:
current_crawl_job = URL(current_url, current_depth)
current_crawl_job.set_soup()
url_dict[current_check_link] = deepcopy(current_crawl_job)
for link in ['http://xts.site.nfoservers.com']: # Crawl for each URL the user inputs
crawl(link, 3)
产生的异常:
Traceback (most recent call last):
File "/home/[CENSORED]/.vscode-oss/extensions/ms-python.python-2020.10.332292344/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_trace_dispatch_regular.py", line 374, in __call__
if cache_skips.get(frame_cache_key) == 1:
RecursionError: maximum recursion depth exceeded in comparison
Fatal Python error: _Py_CheckRecursiveCall: Cannot recover from stack overflow.
Python runtime state: initialized
我无法判断这个特定的无限递归发生在哪里。我已经阅读了诸如此类的问题
RecursionError when python copy.deepcopy but I'm not even sure that it applies to my use-case. If it does apply, then my brain just can't seem to understand it as I'm under the impression deepcopy()
should just take each self
variable value and duplicate it to the new class. If that's not the case, then I would love some enlightenment. All the articles in my search results are similar to this 对我的情况不是很有帮助。
请注意,我并不是简单地寻找我的代码的修改片段来解决这个问题。我主要想了解这里到底发生了什么,这样我既可以现在修复它,也可以在将来避免它。
编辑:这似乎是 deepcopy
和 set_soup()
方法之间的冲突。如果我更换
url_dict[current_check_link] = deepcopy(current_crawl_job)
和
url_dict[current_check_link] = current_crawl_job
上面的代码片段运行没有错误。同样,如果我完全删除 current_crawl_job.set_soup()
,我也不会收到任何错误。我就是不能两者兼得。
Edit2:我可以删除任何一个
try: # Read and store code for parsing
code = request.urlopen(self.url).read()
except Exception as exception:
print(str(exception))
或
self.soup = BeautifulSoup(code, features='lxml')
错误再次消失,程序运行正常。
This Article 表示,
Deep copy is a process in which the copying process occurs recursively. It means first constructing a new collection object and then recursively populating it with copies of the child objects found in the original.
所以我的理解是,
A = [1,2,[3,4],5]
B = deepcopy(A) #This will make 1 level deep recursive call to copy the inner list
C = [1,[2,[3,[4,[5,[6]]]]]]
D = deepcopy(C) #This will make 5 levels deep recursive call (recursively copying inner lists)
我最好的猜测
Python 有最大 递归深度限制 以防止 堆栈溢出 .
您可以使用
找到最大递归深度限制import sys
print(sys.getrecursionlimit())
在您的例子中,您正在尝试深度复制 Class 对象。 class object 的递归调用必须 超过 最大递归限制.
可能的解决方案
您可以告诉 python 使用
设置更高的最大递归限制limit = 2000
sys.setrecursionlimit(limit)
或者您可能会随着程序的进行而增加 limit。有关更多信息,请访问 this link。
我不是 100% 确定增加限制会完成这项工作,但我很确定某些 子对象 你的 class 对象 有太多 内部对象 以至于 deepcopy 变得疯狂!
编辑
有人告诉我下面一行是罪魁祸首,
self.soup = BeautifulSoup(code, features='lxml')
当您执行 current_crawl_job.set_soup()
时,您的 class 的 None
soup 对象将替换为复杂的 BeautifulSoup目的。这给 deepcopy 方法带来了麻烦。
建议
在set_soup方法中,将self.soup
属性保留为原始 html 字符串并在您尝试修改它时将其转换为 BeautifulSoup 对象。这将解决您的深层复制问题。