Jupyter 笔记本中的 ModuleNotFoundError huggingface 数据集

ModuleNotFoundError huggingface datasets in Jupyter notebook

我想在 Jupyter notebook 中使用 huggingface 数据集库。

这应该很简单,只需安装它(pip install datasets,在 venv 中的 bash 中)并导入它(import datasets,在 Python 或笔记本中)。

当我在标准 Python 交互式 shell 中测试它时一切正常,但是,在 Jupyter 笔记本中尝试时,它说:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-6-652e886d387f> in <module>
----> 1 import datasets

ModuleNotFoundError: No module named 'datasets'

起初,我认为可能是notebook内核使用了不同的虚拟环境,但我从notebook内部验证了安装包:

!pip install datasets

Requirement already satisfied: datasets in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (1.8.0)
Requirement already satisfied: numpy>=1.17 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (1.21.0)
Requirement already satisfied: xxhash in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (2.0.2)
Requirement already satisfied: pyarrow<4.0.0,>=1.0.0 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (3.0.0)
Requirement already satisfied: pandas in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (1.2.5)
Requirement already satisfied: fsspec in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (2021.6.1)
Requirement already satisfied: packaging in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (20.9)
Requirement already satisfied: dill in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (0.3.4)
Requirement already satisfied: requests>=2.19.0 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (2.25.1)
Requirement already satisfied: tqdm<4.50.0,>=4.27 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (4.49.0)
Requirement already satisfied: multiprocess in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (0.70.12.2)
Requirement already satisfied: huggingface-hub<0.1.0 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from datasets) (0.0.13)
Requirement already satisfied: pytz>=2017.3 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from pandas->datasets) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from pandas->datasets) (2.8.1)
Requirement already satisfied: pyparsing>=2.0.2 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from packaging->datasets) (2.4.7)
Requirement already satisfied: certifi>=2017.4.17 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (2021.5.30)
Requirement already satisfied: chardet<5,>=3.0.2 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (4.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (1.26.6)
Requirement already satisfied: idna<3,>=2.5 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from requests>=2.19.0->datasets) (2.10)
Requirement already satisfied: typing-extensions in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from huggingface-hub<0.1.0->datasets) (3.10.0.0)
Requirement already satisfied: filelock in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from huggingface-hub<0.1.0->datasets) (3.0.12)
Requirement already satisfied: six>=1.5 in /home/yoga/venvs/text_embeddings/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.16.0)

!pip freeze

certifi==2021.5.30
chardet==4.0.0
datasets==1.8.0
dill==0.3.4
filelock==3.0.12
fsspec==2021.6.1
huggingface-hub==0.0.13
idna==2.10
multiprocess==0.70.12.2
numpy==1.21.0
packaging==20.9
pandas==1.2.5
pyarrow==3.0.0
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2021.1
requests==2.25.1
six==1.16.0
tqdm==4.49.0
typing-extensions==3.10.0.0
urllib3==1.26.6
xxhash==2.0.2

有什么想法吗?我需要以特殊方式配置笔记本,还是数据集模块有问题?谢谢!


编辑: 按照下面的答案,这会使错误消失:

datasets_dir=r"/home/yoga/venvs/text_embeddings/lib/python3.8/site-packages/datasets"

import sys
sys.path.append(datasets_dir)

import datasets

但是有没有一种方法可以在不显式设置此路径的情况下工作? (或者有人可以解释为什么这是必要的吗?)

我遇到过类似的问题,但在另一个库中,这对我有用

import sys
sys.path.append(r"path to datasets in python env")
import dataset_utils

你的路径 -> "/home/yoga/venvs/text_embeddings/lib/python3.8/site-packages/datasets"

我的猜测是环境变量没有设置 PYTHONPATH 没有正确设置。 PYTHONPATH 是一个环境变量,这些内容被添加到 sys.path 中,其中 Python 查找模块。你可以随意设置

这应该有效!!