unpickling 后使用 Latent Dirichlet Allocation 的 transform 方法时出错

Error when using transform method from Latent Dirichlet Allocation after unpickling

我已经使用 sklearn 训练了一个 Latent Dirichlet Allocation 模型。当我解开它时,我然后使用 countVectorizer 转换文档,然后使用 LDA 转换此实例以获得主题分布,但我收到以下错误:

AttributeError: module '__main__' has no attribute 'tokenize'

这是我的代码:

lda = joblib.load('lda_good.pkl')
#LDA trained model

tf_vect = joblib.load('tf_vectorizer_.pkl')
#vectorizer


texts = readContent('doc_name.pdf')

new_doc = tf_vect.transform(texts)

print(new_doc)

print(lda.transform(new_doc))

问题是 countVectorizer 对象 unipickled 工作正常,我可以使用 .transform 但是当我尝试 .transform 使用 LDA 的属性时,它似乎引用了 countvectorizer 中的标记化函数... 函数 tokenize 是在代码之前定义的,但我无法理解 tokenize 与 Latent dirichlet Allocation 的 transform 方法有什么关系。 奇怪的是,所有这些代码在 jupyter notebook 中都运行良好,但当我 运行 它作为脚本时却没有..

所有代码都在一个文件中。该模型是使用 jupyter notebook 训练的,现在我正尝试在脚本中使用该模型。

这是回溯:

Traceback (most recent call last):
File "<string>", line 1, in <module>
Traceback (most recent call last):
File     "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
File "<string>", line 1, in <module>
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
exitcode = _main(fd)
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
Traceback (most recent call last):
Traceback (most recent call last):
_fixup_main_from_path(data['init_main_from_path'])
File "<string>", line 1, in <module>
File "<string>", line 1, in <module>
run_name="__mp_main__")
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
run_name="__mp_main__")
pkg_name=pkg_name, script_name=fname)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
exitcode = _main(fd)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
exitcode = _main(fd)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
pkg_name=pkg_name, script_name=fname)
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
_fixup_main_from_path(data['init_main_from_path'])
prepare(preparation_data)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 226, in prepare
exec(code, run_globals)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
run_name="__mp_main__")
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 278, in _fixup_main_from_path
exec(code, run_globals)
tf_vect = joblib.load('tf_vectorizer_.pkl')
pkg_name=pkg_name, script_name=fname)
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió    documental\POC\program_POC.py", line 160, in <module>
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-  packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py",  line 96, in _run_module_code
run_name="__mp_main__")
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 254, in run_path
tf_vect = joblib.load('tf_vectorizer_.pkl')
obj = unpickler.load()
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File  "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
pkg_name=pkg_name, script_name=fname)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 96, in _run_module_code
exec(code, run_globals)
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
obj = unpickler.load()
mod_name, mod_spec, pkg_name, script_name)
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 85, in _run_code
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
tf_vect = joblib.load('tf_vectorizer_.pkl')
exec(code, run_globals)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site- packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\Documents\Machine learning\gestió documental\POC\program_POC.py", line 160, in <module>
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
tf_vect = joblib.load('tf_vectorizer_.pkl')
obj = unpickler.load()
klass = self.find_class(module, name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
klass = self.find_class(module, name)
obj = unpickler.load()
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
dispatch[key[0]](self)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1334, in load_global
klass = self.find_class(module, name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
klass = self.find_class(module, name)
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1388, in find_class
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'tokenize'
Traceback (most recent call last):
File "<string>", line 1, in <module>
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
File "C:\Users\eduard.bermejo\AppData\Local\Continuum\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
exitcode = _main(fd)

它实际上还在继续,但我想这就足够了,因为它开始了某种循环。

如果需要更多信息,请告诉我。

提前致谢

查看 similar questions 上的 SO 表明存在 pickling 和 unpickling 问题。您用来执行 joblib.dump 的代码与我假设的目录位于不同的目录中。相反,您可以将它放在与该程序相同的目录中,然后重新 运行 pickler 和 unpickler 吗? __main__ 存储在 pickled 目录中,unpickler 在 运行s 时尝试搜索它。