AttributeError: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas

Question

我是 Kedro 的新手，在我的 conda 环境中安装 kedro 后，我在尝试列出我的目录时遇到以下错误：

执行的命令：kedro catalog list

错误：

kedro.io.core.DataSetError: An exception occurred when parsing config for DataSet df_medinfo_raw: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas. Please see the documentation on how to install relevant dependencies for kedro.extras.datasets.pandas.ParquetDataSet:

我通过 conda-forge 安装了 kedro：conda install -c conda-forge "kedro[pandas]"。据我了解，这种安装 kedro 的方式也会安装 pandas 依赖项。

我试图阅读 kedro 文档以了解依赖项，但不太清楚如何解决此类问题。

我的kedro版本是0.17.6.

Answer 1

尝试使用 pip 安装

pip install "kedro[pandas]"

截至目前，conda 不支持可选依赖项。此处提交相同的功能请求 https://github.com/conda/conda/issues/7502

此外，在 kedro 文档中，推荐使用提到的 pip https://kedro.readthedocs.io/en/stable/02_get_started/02_install.html

It is also possible to install Kedro using conda, as follows, but we recommend using pip at this point to eliminate any potential dependency issues, as follows:

此外，正如@datajoely 提到的，您还可以通过以下内容更具体地说明您需要哪些所有数据集模块。

pip install "kedro[pandas.ParquetDataSet]"

您可以在此处阅读有关 kedro 依赖项的更多信息 https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/01_dependencies.html?highlight=top-level#workflow-dependencies

Answer 2

Kedro 使用 Pandas 加载 ParquetDataSet 对象，并且 Pandas 需要额外的依赖项来完成此操作（请参阅 "Installation: Other data sources"）。也就是说，除了 Pandas，还必须安装 fastparquet 或 pyarrow。

对于 Conda，您要么想要：

## use pyarrow for parquet
conda install -c conda-forge kedro pandas pyarrow

或

## or use fastparquet for parquet
conda install -c conda-forge kedro pandas fastparquet

请注意，问题 kedro[pandas] 中使用的语法对 Conda 没有意义（即，它最终解析为 kedro）。 Conda 包规范使用，其中 [...] 内的任何内容都被解析为 [key1=value1;key2=value2;...] 语法。本质上，[pandas] 被视为未知密钥，将被忽略。

AttributeError: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas

AttributeError: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas

python

pip

conda

kedro