unpickling 后使用 Latent Dirichlet Allocation 的 transform 方法时出错
Error when using transform method from Latent Dirichlet Allocation after unpickling
我已经使用 sklearn 训练了一个 Latent Dirichlet Allocation 模型。当我解开它时,我然后使用 countVectorizer 转换文档,然后使用 LDA 转换此实例以获得主题分布,但我收到以下错误:
AttributeError: module '__main__' has no attribute 'tokenize'
这是我的代码:
lda = joblib.load('lda_good.pkl')
#LDA trained model
tf_vect = joblib.load('tf_vectorizer_.pkl')
#vectorizer
texts = readContent('doc_name.pdf')
new_doc = tf_vect.transform(texts)
print(new_doc)
print(lda.transform(new_doc))
问题是 countVectorizer 对象 unipickled 工作正常,我可以使用 .transform
但是当我尝试 .transform
使用 LDA 的属性时,它似乎引用了 countvectorizer 中的标记化函数...
函数 tokenize 是在代码之前定义的,但我无法理解 tokenize 与 Latent dirichlet Allocation 的 transform 方法有什么关系。
奇怪的是,所有这些代码在 jupyter notebook 中都运行良好,但当我 运行 它作为脚本时却没有..
所有代码都在一个文件中。该模型是使用 jupyter notebook 训练的,现在我正尝试在脚本中使用该模型。
这是回溯:
Traceback (most recent call last):
File "<string>", line 1, in <module>
Traceback (most recent call last):
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
File "<string>", line 1, in <module>
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
exitcode = _main(fd)
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
Traceback (most recent call last):
Traceback (most recent call last):
_fixup_main_from_path(data['init_main_from_path'])
File "<string>", line 1, in <module>
File "<string>", line 1, in <module>
run_name="__mp_main__")
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
run_name="__mp_main__")
pkg_name=pkg_name, script_name=fname)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
exitcode = _main(fd)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
exitcode = _main(fd)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
pkg_name=pkg_name, script_name=fname)
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
_fixup_main_from_path(data['init_main_from_path'])
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
exec(code, run_globals)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
run_name="__mp_main__")
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
exec(code, run_globals)
tf_vect = joblib.load('tf_vectorizer_.pkl')
pkg_name=pkg_name, script_name=fname)
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site- packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
run_name="__mp_main__")
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
tf_vect = joblib.load('tf_vectorizer_.pkl')
obj = unpickler.load()
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
pkg_name=pkg_name, script_name=fname)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
exec(code, run_globals)
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
obj = unpickler.load()
mod_name, mod_spec, pkg_name, script_name)
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
tf_vect = joblib.load('tf_vectorizer_.pkl')
exec(code, run_globals)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site- packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
tf_vect = joblib.load('tf_vectorizer_.pkl')
obj = unpickler.load()
klass = self.find_class(module, name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
klass = self.find_class(module, name)
obj = unpickler.load()
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
klass = self.find_class(module, name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
klass = self.find_class(module, name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
Traceback (most recent call last):
File "<string>", line 1, in <module>
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
exitcode = _main(fd)
它实际上还在继续,但我想这就足够了,因为它开始了某种循环。
如果需要更多信息,请告诉我。
提前致谢
查看 similar questions 上的 SO 表明存在 pickling 和 unpickling 问题。您用来执行 joblib.dump
的代码与我假设的目录位于不同的目录中。相反,您可以将它放在与该程序相同的目录中,然后重新 运行 pickler 和 unpickler 吗? __main__
存储在 pickled 目录中,unpickler 在 运行s 时尝试搜索它。
我已经使用 sklearn 训练了一个 Latent Dirichlet Allocation 模型。当我解开它时,我然后使用 countVectorizer 转换文档,然后使用 LDA 转换此实例以获得主题分布,但我收到以下错误:
AttributeError: module '__main__' has no attribute 'tokenize'
这是我的代码:
lda = joblib.load('lda_good.pkl')
#LDA trained model
tf_vect = joblib.load('tf_vectorizer_.pkl')
#vectorizer
texts = readContent('doc_name.pdf')
new_doc = tf_vect.transform(texts)
print(new_doc)
print(lda.transform(new_doc))
问题是 countVectorizer 对象 unipickled 工作正常,我可以使用 .transform
但是当我尝试 .transform
使用 LDA 的属性时,它似乎引用了 countvectorizer 中的标记化函数...
函数 tokenize 是在代码之前定义的,但我无法理解 tokenize 与 Latent dirichlet Allocation 的 transform 方法有什么关系。
奇怪的是,所有这些代码在 jupyter notebook 中都运行良好,但当我 运行 它作为脚本时却没有..
所有代码都在一个文件中。该模型是使用 jupyter notebook 训练的,现在我正尝试在脚本中使用该模型。
这是回溯:
Traceback (most recent call last):
File "<string>", line 1, in <module>
Traceback (most recent call last):
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
File "<string>", line 1, in <module>
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
exitcode = _main(fd)
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
Traceback (most recent call last):
Traceback (most recent call last):
_fixup_main_from_path(data['init_main_from_path'])
File "<string>", line 1, in <module>
File "<string>", line 1, in <module>
run_name="__mp_main__")
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
run_name="__mp_main__")
pkg_name=pkg_name, script_name=fname)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
exitcode = _main(fd)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
exitcode = _main(fd)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
pkg_name=pkg_name, script_name=fname)
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
_fixup_main_from_path(data['init_main_from_path'])
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
exec(code, run_globals)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
run_name="__mp_main__")
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
exec(code, run_globals)
tf_vect = joblib.load('tf_vectorizer_.pkl')
pkg_name=pkg_name, script_name=fname)
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site- packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
run_name="__mp_main__")
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
tf_vect = joblib.load('tf_vectorizer_.pkl')
obj = unpickler.load()
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
pkg_name=pkg_name, script_name=fname)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
exec(code, run_globals)
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
obj = unpickler.load()
mod_name, mod_spec, pkg_name, script_name)
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
tf_vect = joblib.load('tf_vectorizer_.pkl')
exec(code, run_globals)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site- packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
tf_vect = joblib.load('tf_vectorizer_.pkl')
obj = unpickler.load()
klass = self.find_class(module, name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
klass = self.find_class(module, name)
obj = unpickler.load()
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
klass = self.find_class(module, name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
klass = self.find_class(module, name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
Traceback (most recent call last):
File "<string>", line 1, in <module>
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
exitcode = _main(fd)
它实际上还在继续,但我想这就足够了,因为它开始了某种循环。
如果需要更多信息,请告诉我。
提前致谢
查看 similar questions 上的 SO 表明存在 pickling 和 unpickling 问题。您用来执行 joblib.dump
的代码与我假设的目录位于不同的目录中。相反,您可以将它放在与该程序相同的目录中,然后重新 运行 pickler 和 unpickler 吗? __main__
存储在 pickled 目录中,unpickler 在 运行s 时尝试搜索它。