How to get Great_Expectations to work with Spark Dataframes in Apache Spark ValueError: Unrecognized spark type: string

How to get Great_Expectations to work with Spark Dataframes in Apache Spark ValueError: Unrecognized spark type: string

我有一个作为 'string' 类型字段的 Apache Spark 数据框。但是,Great_Expectations 无法识别字段类型。我已经导入了我认为必要的模块,但不确定为什么 Great_Expectations 无法识别字段

import great_expectations as ge
import great_expectations.dataset.sparkdf_dataset
from great_expectations.dataset.sparkdf_dataset import SparkDFDataset
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, BooleanType

以下代码将 csv 作为数据框读取

test = spark.read.csv('abfss://root@adlspretbiukadlsdev.dfs.core.windows.net/RAW/LANDING/customers.csv', inferSchema=True, header=True)

架构如下:

test.printSchema()
Command executed in 2 sec 64 ms by carlton on 1:53:28 PM, 6/17/21
root
 |-- first_name: string (nullable = true)

我认为以下代码行从上面的 Spark Dataframe

创建了 Great_Expectation dataframe
test2 = ge.dataset.SparkDFDataset(test)

然后我按照以下期望编写代码:

test2.expect_column_values_to_be_of_type(column='first_name', type_='string')

但是,我收到以下错误:

ValueError: Unrecognized spark type: string
Traceback (most recent call last):

  File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/great_expectations/data_asset/util.py", line 80, in f
    return self.mthd(obj, *args, **kwargs)

不确定为什么 Great_Expectations 无法识别 Spark 类型?

您需要使用 StringTypeLongType 等名称 - 与 documentation 中指定的名称相同。应该是这样的:

test2.expect_column_values_to_be_of_type("first_name", "StringType")

查看截图: