当值为“”时,Pyspark 2.7 将数据框中的 StringType 列设置为 'null'

Pyspark 2.7 Set StringType columns in a dataframe to 'null' when value is ""

我有一个名为 good_df 的 DataFrame,它具有混合类型的列。我正在尝试将 StringType 列的任何空值设置为 'null'。我认为下面的代码可以工作,但事实并非如此。

self.good_df = self.good_df.select([when((col(c)=='') & (isinstance(self.good_df.schema[c].dataType, StringType)),'null').otherwise(col(c)).alias(c) for c in self.good_df.columns])

我正在查看错误消息,它并没有给我太多线索:

Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.7/site-packages/pyspark/sql/column.py", line 116, in _ njc = getattr(self._jc, name)(jc) File "/usr/lib/python2.7/site-packages/py4j/java_gateway.py", line 1257, in call answer, self.gateway_client, self.target_id, self.name) File "/usr/lib/python2.7/site-packages/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/usr/lib/python2.7/site-packages/py4j/protocol.py", line 332, in get_return_value format(target_id, ".", name, value)) Py4JError: An error occurred while calling o792.and. Trace: py4j.Py4JException: Method and([class java.lang.Boolean]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)

有人知道发生了什么事吗? 谢谢!

您收到的错误信息:

py4j.Py4JException: Method and([class java.lang.Boolean]) does not exist

这意味着您正在尝试在 Column 表达式和文字 Boolean 值之间应用 AND 运算符。

您需要更改此部分:

(isinstance(self.good_df.schema[c].dataType, StringType))

from pyspark.sql.functions import lit

lit(isinstance(self.good_df.schema[c].dataType, StringType))

也就是说,实际上您可以将检查列类型的条件直接移动到 python list-comprehension 中:

self.good_df = self.good_df.select(*[
    when((col(c) == ''), 'null').otherwise(col(c)).alias(c) if t == "string" else col(c)
    for c, t in self.good_df.dtypes
])