无法初始化雪花数据源
unable to initialize snowflake data source
我正在尝试使用“great_expectations”库访问雪花数据源。
以下是我到目前为止所尝试的:
from ruamel import yaml
import great_expectations as ge
from great_expectations.core.batch import BatchRequest, RuntimeBatchRequest
context = ge.get_context()
datasource_config = {
"name": "my_snowflake_datasource",
"class_name": "Datasource",
"execution_engine": {
"class_name": "SqlAlchemyExecutionEngine",
"connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
},
"data_connectors": {
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": ["default_identifier_name"],
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"include_schema_name": True,
},
},
}
print(context.test_yaml_config(yaml.dump(datasource_config)))
我在执行上面的代码之前启动了great_expectation:
great_expectations init
但我收到以下错误:
great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_snowflake_datasource, error: 'NoneType' object has no attribute 'create_engine'
我做错了什么?
你的配置好像没问题,对应例子here。
如果您查看回溯,您应该注意到错误从虚拟环境中的文件 great_expectations/execution_engine/sqlalchemy_execution_engine.py
开始传播。
实际发生错误的行是:
self.engine = sa.create_engine(connection_string, **kwargs)
如果您在该文件的顶部搜索 sa
:
import sqlalchemy as sa
make_url = import_make_url()
except ImportError:
sa = None
所以没有安装sqlalchemy,你
如果你安装 greate_expectiations,不要自动进入你的环境。要做的事情是
安装 snowflake-sqlalchemy,因为你想使用 sqlalchemy 的雪花
插件(基于您的 connection_string 的假设)。
/your/virtualenv/bin/python -m pip install snowflake-sqlalchemy
之后您应该不会再收到错误,看起来 test_yaml_config
正在等待连接
超时。
让我非常担心的是 ruamel.yaml
的已弃用 API 的使用记录。
ruamel.yaml.dump
功能近期将被移除,你
应该使用 ruamel.yaml.YAML()
实例的 .dump()
方法。
您应该改用以下代码:
import sys
from ruamel.yaml import YAML
import great_expectations as ge
context = ge.get_context()
datasource_config = {
"name": "my_snowflake_datasource",
"class_name": "Datasource",
"execution_engine": {
"class_name": "SqlAlchemyExecutionEngine",
"connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
},
"data_connectors": {
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": ["default_identifier_name"],
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"include_schema_name": True,
},
},
}
yaml = YAML()
yaml.dump(datasource_config, sys.stdout, transform=context.test_yaml_config)
我会为 great-excpectations 做一个 PR 来更新他们 ruamel.yaml
的 documentation/use。
我正在尝试使用“great_expectations”库访问雪花数据源。
以下是我到目前为止所尝试的:
from ruamel import yaml
import great_expectations as ge
from great_expectations.core.batch import BatchRequest, RuntimeBatchRequest
context = ge.get_context()
datasource_config = {
"name": "my_snowflake_datasource",
"class_name": "Datasource",
"execution_engine": {
"class_name": "SqlAlchemyExecutionEngine",
"connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
},
"data_connectors": {
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": ["default_identifier_name"],
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"include_schema_name": True,
},
},
}
print(context.test_yaml_config(yaml.dump(datasource_config)))
我在执行上面的代码之前启动了great_expectation:
great_expectations init
但我收到以下错误:
great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_snowflake_datasource, error: 'NoneType' object has no attribute 'create_engine'
我做错了什么?
你的配置好像没问题,对应例子here。
如果您查看回溯,您应该注意到错误从虚拟环境中的文件 great_expectations/execution_engine/sqlalchemy_execution_engine.py
开始传播。
实际发生错误的行是:
self.engine = sa.create_engine(connection_string, **kwargs)
如果您在该文件的顶部搜索 sa
:
import sqlalchemy as sa
make_url = import_make_url()
except ImportError:
sa = None
所以没有安装sqlalchemy,你 如果你安装 greate_expectiations,不要自动进入你的环境。要做的事情是 安装 snowflake-sqlalchemy,因为你想使用 sqlalchemy 的雪花 插件(基于您的 connection_string 的假设)。
/your/virtualenv/bin/python -m pip install snowflake-sqlalchemy
之后您应该不会再收到错误,看起来 test_yaml_config
正在等待连接
超时。
让我非常担心的是 ruamel.yaml
的已弃用 API 的使用记录。
ruamel.yaml.dump
功能近期将被移除,你
应该使用 ruamel.yaml.YAML()
实例的 .dump()
方法。
您应该改用以下代码:
import sys
from ruamel.yaml import YAML
import great_expectations as ge
context = ge.get_context()
datasource_config = {
"name": "my_snowflake_datasource",
"class_name": "Datasource",
"execution_engine": {
"class_name": "SqlAlchemyExecutionEngine",
"connection_string": "snowflake://myusername:mypass@myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
},
"data_connectors": {
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": ["default_identifier_name"],
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"include_schema_name": True,
},
},
}
yaml = YAML()
yaml.dump(datasource_config, sys.stdout, transform=context.test_yaml_config)
我会为 great-excpectations 做一个 PR 来更新他们 ruamel.yaml
的 documentation/use。