无法初始化雪花数据源

unable to initialize snowflake data source

我正在尝试使用“great_expectations”库访问雪花数据源。

以下是我到目前为止所尝试的:

from ruamel import yaml

import great_expectations as ge
from great_expectations.core.batch import BatchRequest, RuntimeBatchRequest

context = ge.get_context()



datasource_config = {
    "name": "my_snowflake_datasource",
    "class_name": "Datasource",
    "execution_engine": {
        "class_name": "SqlAlchemyExecutionEngine",
        "connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        },
        "default_inferred_data_connector_name": {
            "class_name": "InferredAssetSqlDataConnector",
            "include_schema_name": True,
        },
    },
}

print(context.test_yaml_config(yaml.dump(datasource_config)))

我在执行上面的代码之前启动了great_expectation:

great_expectations init

但我收到以下错误:

great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_snowflake_datasource, error: 'NoneType' object has no attribute 'create_engine'

我做错了什么?

你的配置好像没问题,对应例子here

如果您查看回溯,您应该注意到错误从虚拟环境中的文件 great_expectations/execution_engine/sqlalchemy_execution_engine.py 开始传播。

实际发生错误的行是:

            self.engine = sa.create_engine(connection_string, **kwargs)

如果您在该文件的顶部搜索 sa

import sqlalchemy as sa

make_url = import_make_url()
except ImportError:
        sa = None

所以没有安装sqlalchemy,你 如果你安装 greate_expectiations,不要自动进入你的环境。要做的事情是 安装 snowflake-sqlalchemy,因为你想使用 sqlalchemy 的雪花 插件(基于您的 connection_string 的假设)。

/your/virtualenv/bin/python -m pip install snowflake-sqlalchemy

之后您应该不会再收到错误,看起来 test_yaml_config 正在等待连接 超时。

让我非常担心的是 ruamel.yaml 的已弃用 API 的使用记录。 ruamel.yaml.dump 功能近期将被移除,你 应该使用 ruamel.yaml.YAML() 实例的 .dump() 方法。

您应该改用以下代码:

import sys
from ruamel.yaml import YAML

import great_expectations as ge
context = ge.get_context()

datasource_config = {
    "name": "my_snowflake_datasource",
    "class_name": "Datasource",
    "execution_engine": {
        "class_name": "SqlAlchemyExecutionEngine",
        "connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        },
        "default_inferred_data_connector_name": {
            "class_name": "InferredAssetSqlDataConnector",
            "include_schema_name": True,
        },
    },
}

yaml = YAML()

yaml.dump(datasource_config, sys.stdout, transform=context.test_yaml_config)

我会为 great-excpectations 做一个 PR 来更新他们 ruamel.yaml 的 documentation/use。