Pyspark 从数据框中的列中删除空值
Pyspark Removing null values from a column in dataframe
我的数据框如下所示
ID,FirstName,LastName
1,Navee,Srikanth
2,,Srikanth
3,Naveen,
现在我的问题陈述是我必须删除行号 2,因为名字为空。
我正在使用下面的 pyspark 脚本
join_Df1= Name.filter(Name.col(FirstName).isnotnull()).show()
我收到错误
File "D:[=12=]\NameValidation.py", line 13, in <module>
join_Df1= filter(Name.FirstName.isnotnull()).show()
TypeError: 'Column' object is not callable
谁能帮我解决这个问题
我想你可能需要的是这个notnull()
。
这是您在 csv 文件中的输入 my_test.csv
:
ID,FirstName,LastName
1,Navee,Srikanth
2,,Srikanth
3,Naveen
代码:
import pandas as pd
df = pd.read_csv("my_test.csv")
print(df[df['FirstName'].notnull()])
输出:
ID FirstName LastName
0 1 Navee Srikanth
2 3 Naveen NaN
这就是您想要的! df[df['FirstName'].notnull()]
df['FirstName'].notnull()
的输出:
0 True
1 False
2 True
这将创建一个数据框 df
,其中 df['FirstName'].notnull()
returns True
这是如何检查的? df['FirstName'].notnull()
如果 FirstName
列的值不为空 return True
否则如果存在 NaN
return False
.
你应该做如下
join_Df1.filter(join_Df1.FirstName.isNotNull()).show
希望对您有所帮助!
您的 DataFrame FirstName 似乎是空值 Null
。以下是一些可供尝试的选项:-
df = sqlContext.createDataFrame([[1,'Navee','Srikanth'], [2,'','Srikanth'] , [3,'Naveen','']], ['ID','FirstName','LastName'])
df.show()
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 2| |Srikanth|
| 3| Naveen| |
+---+---------+--------+
df.where(df.FirstName.isNotNull()).show() #This doen't remove null because df have empty value
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 2| |Srikanth|
| 3| Naveen| |
+---+---------+--------+
df.where(df.FirstName != '').show()
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 3| Naveen| |
+---+---------+--------+
df.filter(df.FirstName != '').show()
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 3| Naveen| |
+---+---------+--------+
df.where("FirstName != ''").show()
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 3| Naveen| |
+---+---------+--------+
我的数据框如下所示
ID,FirstName,LastName
1,Navee,Srikanth
2,,Srikanth
3,Naveen,
现在我的问题陈述是我必须删除行号 2,因为名字为空。
我正在使用下面的 pyspark 脚本
join_Df1= Name.filter(Name.col(FirstName).isnotnull()).show()
我收到错误
File "D:[=12=]\NameValidation.py", line 13, in <module>
join_Df1= filter(Name.FirstName.isnotnull()).show()
TypeError: 'Column' object is not callable
谁能帮我解决这个问题
我想你可能需要的是这个notnull()
。
这是您在 csv 文件中的输入 my_test.csv
:
ID,FirstName,LastName
1,Navee,Srikanth
2,,Srikanth
3,Naveen
代码:
import pandas as pd
df = pd.read_csv("my_test.csv")
print(df[df['FirstName'].notnull()])
输出:
ID FirstName LastName
0 1 Navee Srikanth
2 3 Naveen NaN
这就是您想要的! df[df['FirstName'].notnull()]
df['FirstName'].notnull()
的输出:
0 True
1 False
2 True
这将创建一个数据框 df
,其中 df['FirstName'].notnull()
returns True
这是如何检查的? df['FirstName'].notnull()
如果 FirstName
列的值不为空 return True
否则如果存在 NaN
return False
.
你应该做如下
join_Df1.filter(join_Df1.FirstName.isNotNull()).show
希望对您有所帮助!
您的 DataFrame FirstName 似乎是空值 Null
。以下是一些可供尝试的选项:-
df = sqlContext.createDataFrame([[1,'Navee','Srikanth'], [2,'','Srikanth'] , [3,'Naveen','']], ['ID','FirstName','LastName'])
df.show()
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 2| |Srikanth|
| 3| Naveen| |
+---+---------+--------+
df.where(df.FirstName.isNotNull()).show() #This doen't remove null because df have empty value
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 2| |Srikanth|
| 3| Naveen| |
+---+---------+--------+
df.where(df.FirstName != '').show()
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 3| Naveen| |
+---+---------+--------+
df.filter(df.FirstName != '').show()
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 3| Naveen| |
+---+---------+--------+
df.where("FirstName != ''").show()
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 3| Naveen| |
+---+---------+--------+