在 PySpark Dataframe 中计算零次出现
Count zero occurrences in PySpark Dataframe
如何计算每个 PySpark Dataframe's
行中 0s
的出现次数?
我想要这个结果,请注意 n0
列的行数:
+--------+-----+-----+----+-----+---+
|center |var1 |var2 |var3|var4 |n0 |
+--------+-----+-----+----+-----+---+
|center_a|0 |1 |0 |0 |3 |
|center_b|1 |1 |2 |4 |0 |
|center_c|1 |0 |1 |0 |2 |
+--------+-----+-----+----+-----+---+
我试过这段代码,但没有成功。
x['n0'] = (x == 0).sum(axis=1)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-63-8a95da0a1861> in <module>()
----> 1 (x == 0).sum(axis=1)
AttributeError: 'bool' object has no attribute 'sum'
逐行0
校验和求和:
from pyspark.sql import functions as F
df.withColumn("n0", sum(F.when(df[col] == 0, 1).otherwise(0) for col in df.columns)).show()
+--------+----+----+----+----+---+
| center|var1|var2|var3|var4| n0|
+--------+----+----+----+----+---+
|center_a| 0| 1| 0| 0| 3|
|center_b| 1| 1| 2| 4| 0|
|center_c| 1| 0| 1| 0| 2|
+--------+----+----+----+----+---+
如何计算每个 PySpark Dataframe's
行中 0s
的出现次数?
我想要这个结果,请注意 n0
列的行数:
+--------+-----+-----+----+-----+---+
|center |var1 |var2 |var3|var4 |n0 |
+--------+-----+-----+----+-----+---+
|center_a|0 |1 |0 |0 |3 |
|center_b|1 |1 |2 |4 |0 |
|center_c|1 |0 |1 |0 |2 |
+--------+-----+-----+----+-----+---+
我试过这段代码,但没有成功。
x['n0'] = (x == 0).sum(axis=1)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-63-8a95da0a1861> in <module>()
----> 1 (x == 0).sum(axis=1)
AttributeError: 'bool' object has no attribute 'sum'
逐行0
校验和求和:
from pyspark.sql import functions as F
df.withColumn("n0", sum(F.when(df[col] == 0, 1).otherwise(0) for col in df.columns)).show()
+--------+----+----+----+----+---+
| center|var1|var2|var3|var4| n0|
+--------+----+----+----+----+---+
|center_a| 0| 1| 0| 0| 3|
|center_b| 1| 1| 2| 4| 0|
|center_c| 1| 0| 1| 0| 2|
+--------+----+----+----+----+---+