尝试在 Flask 应用程序中解开模型时出现 ModuleNotFoundError

Question

Python版本：3.6.9

我已经使用 pickle 将机器学习模型转储到一个文件中，当我尝试使用 Flask 运行对其进行预测时，它失败了 ModuleNotFoundError: No module named 'predictors'。我如何修复此错误以便它识别我的模型，无论我是尝试通过 Flask 还是通过 Python 命令（例如 python predict_edu.py）运行进行预测？

这是我的文件结构：

 - video_discovery
   __init__.py
   - data_science
     - model
     - __init__.py
     - predict_edu.py
     - predictors.py
     - train_model.py

这是我的 predict_edu.py 文件：

import pickle

with open('model', 'rb') as f:
        bow_model = pickle.load(f)

这是我的 predictors.py 文件：

from sklearn.base import TransformerMixin

# Basic function to clean the text
def clean_text(text):
    # Removing spaces and converting text into lowercase
    return text.strip().lower()

# Custom transformer using spaCy
class predictor_transformer(TransformerMixin):
    def transform(self, X, **transform_params):
        # Cleaning Text
        return [clean_text(text) for text in X]

    def fit(self, X, y=None, **fit_params):
        return self

    def get_params(self, deep=True):
        return {}

这是我训练模型的方式：

python data_science/train_model.py

这是我的 train_model.py 文件：

from predictors import predictor_transformer

# pipeline = Pipeline([("cleaner", predictor_transformer()), ('vectorizer', bow_vector), ('classifier', classifier_18p)])
pipeline = Pipeline([("cleaner", predictor_transformer())])

with open('model', 'wb') as f:
        pickle.dump(pipeline, f)

我的 Flask 应用程序位于：video_discovery/__init__.py

这是我运行我的 Flask 应用程序的方式：

FLASK_ENV=development FLASK_APP=video_discovery flask run

我认为可能会出现此问题，因为我正在通过运行直接 Python 脚本而不是使用 Flask 来训练模型，因此可能存在一些命名空间问题，但我'我不确定如何解决这个问题。训练我的模型需要一段时间，所以我不能完全等待 HTTP 请求。

我遗漏了什么可以解决这个问题？

Answer 1

执行 predict_edu.py 时出现该错误似乎有点奇怪，因为它与 predictors.py 在同一目录中，因此使用 from predictors import predictor_transformer 等绝对导入（没有点 . 运算符）通常应该按预期工作。但是，如果错误仍然存在，您可以尝试以下几个选项。

选项 1

在尝试导入模块之前，您可以将 predictors 文件的父目录添加到系统 PATH 变量中，如所述。这应该适用于较小的项目。

import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
from predictors import predictor_transformer

选项 2

使用相对导入，例如 from .predictors import...，并确保运行来自包父目录的脚本，如下所示。 -m option "searches the sys.path for the named module and execute its contents as the __main__ module", and not as the top-level script. Read more about the -m option in the following references: [1], [2], , [4], , . Read more about "relative imports" here: [1], [2], [3], [4].

python -m video_discovery.data_science.predict_edu

但是，PEP 8 style guide 建议一般使用绝对导入。

Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages) if the import system is incorrectly configured (such as when a directory inside a package ends up on sys.path)

但是，在某些情况下，绝对导入可能会变得非常冗长，具体取决于目录结构的复杂性，如下所示。另一方面，“相对导入可能会很混乱，特别是对于目录结构可能发生变化的共享项目”。它们也“不如绝对的可读，而且很难分辨导入资源的位置”。阅读更多关于 Python Import and Absolute vs Relative Imports.

from package1.subpackage2.subpackage3.subpackage4.module5 import function6

选项 3

在 PYTHONPATH and use absolute imports instead. PYTHONPATH is used to set the path for user-defined modules, so that they can be directly imported into a Python script. The PYTHONPATH variable is a string with a list of directories that need to be added to the sys.path directory list by Python. The primary use of this variable is to allow users to import modules that have not yet made into an installable Python package. Read more about it here and here.

例如，假设您有一个名为 video_discovery 的包（在 /Users/my_user/code/video_discovery 下）并且想要将目录 /Users/my_user/code 添加到 PYTHONPATH:

开Mac

打开Terminal.app
在文本编辑器中打开文件 ~/.bash_profile – 例如atom ~/.bash_profile
在末尾添加以下行：export PYTHONPATH="/Users/my_user/code"
保存文件。
关闭Terminal.app
再次启动 Terminal.app，读取新设置，然后键入 echo $PYTHONPATH。它应该显示类似 /Users/my_user/code.

开Linux

打开你最喜欢的终端程序
在文本编辑器中打开文件 ~/.bashrc – 例如atom ~/.bashrc
在末尾添加以下行：export PYTHONPATH=/home/my_user/code
保存文件。
关闭您的终端应用程序。
再次启动终端应用程序，读取新设置，并输入 echo $PYTHONPATH。它应该显示类似 /home/my_user/code.
的内容

开Windows

打开This PC（或Computer），right-click里面和select Properties.
从计算机属性对话框中，左侧 select Advanced system settings。
在高级系统设置对话框中，选择 Environment variables 按钮。
在“环境变量”对话框中，单击 New 按钮 对话框的上半部分，创建一个新的用户变量:
将变量 name 指定为 PYTHONPATH 并在 value 中添加路径你的模块目录。再次选择 OK 和 OK 以保存此变量。
现在打开 cmd window 并键入 echo %PYTHONPATH% 以确认环境变量已正确设置。记住打开一个新的cmd window到运行你的Python程序，这样它就会在PYTHONPATH中选择新的设置。

选项 4

另一种解决方案是以可编辑状态安装包（对 .py 文件所做的所有编辑将自动包含在已安装的包中），如 here and here 所述。但是，实现此功能所需的工作量可能会使选项 3 成为您更好的选择。

setup.py的内容应该如下所示，安装包的命令应该是pip install -e .（-e标志代表“可编辑”，.代表“当前目录”）。

from setuptools import setup, find_packages
setup(name='myproject', version='1.0', packages=find_packages())

Answer 2

来自 https://docs.python.org/3/library/pickle.html:

pickle can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored.

当您运行 python data_science/train_model.py 并导入 from predictors 时，Python 将 predictors 作为 top-level 模块导入并且 predictor_transformer在那个模块中。

但是，当您运行通过 Flask 从 video_discovery 的父文件夹进行预测时，predictor_transformer 在 video_discovery.data_science.predictors 模块中。

从一致的路径

使用相对导入和运行

train_model.py: 使用相对导入

# from predictors import predictor_transformer  # -
from .predictors import predictor_transformer   # +

训练模型: 运行 train_model with video_discovery as top-level module

# python data_science/train_model.py                # -
python -m video_discovery.data_science.train_model  # +

运行通过 Python 命令的预测 : 运行 predict_edu with video_discovery as top-level模块

# python predict_edu.py                             # -
python -m video_discovery.data_science.predict_edu  # +

运行通过 Flask 的预测：（没有变化，已经运行和 video_discovery 作为 top-level 模块）

FLASK_ENV=development FLASK_APP=video_discovery flask run

尝试在 Flask 应用程序中解开模型时出现 ModuleNotFoundError

ModuleNotFoundError when trying to unpickle a model in Flask app

python

pickle

flask

选项 1

选项 2

选项 3

选项 4

从一致的路径