如何找到 Dataframe 中的所有 Null 字段

How Can I find all the Null fields in Dataframe

我试图找出数据框中存在的空字段,并将所有字段连接到同一数据框中的新字段中。

输入数据框如下所示

name state number
James CA 100
Julia Null Null
Null CA 200

预期输出

name state number Null Fields
James CA 100
Julia Null Null state,number
Null CA 200 name

我的代码看起来像这样,但它失败了。请帮帮我。

from pyspark.sql import functions as F
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data2 = [("James","CA","100"),
("Julia",None,None),
(None,"CA","200")]

schema = StructType([ \
StructField("name",StringType(),True), \
StructField("state",StringType(),True), \
StructField("number",StringType(),True)
])

df = spark.createDataFrame(data=data2,schema=schema)
cols = ["name","state","number"]
df.show()

def null_constraint_check(df,cols):
 df_null_identifier = df.withColumn("NULL Fields",\
                                    [F.count(F.when(F.col(c).isNull(), c)) for c in cols])
 return df_null_identifier
df1 = null_constraint_check(df,cols)

我收到错误

AssertionError: col should be Column

你的做法是正确的,只需要在null_constraint_check中做一点小改动: [F.count(...)] 是一个列列表,withColumn expects a single column as second parameter. One way to get there is to concatenate all elements of the list using concat_ws:

def null_constraint_check(df,cols):
    df_null_identifier = df.withColumn("NULL Fields",
                         F.concat_ws(",",*[F.when(F.col(c).isNull(), c) for c in cols]))
    return df_null_identifier

我也删除了 F.count 因为你的问题说你想要空列的名称。

结果是:

+-----+-----+------+------------+
| name|state|number| NULL Fields|
+-----+-----+------+------------+
|James|   CA|   100|            |
|Julia| null|  null|state,number|
| null|   CA|   200|        name|
+-----+-----+------+------------+