如何检查spark中一行中的值是否为空
How to check if a value in a row is empty in spark
我有一个从 json 文件中读取的数据帧 df:
val df = spark.read.json("C:\filepath\file.json")
其中有以下数据
Id
downloadUrl
title
52193
https://...
Title...
5441
https://...
Title...
5280
null
null
5190
https://...
Title...
5215
https://...
Title...
1245
https://...
Title...
339
null
Editorial
59
https://...
Title...
现在我想创建一个只有行 downloadUrl 和 title 不为空的新数据框或 rdd。
df.map(row=>{
// here I want to see if the downloadUrl is null
// do something
// else if the title is null
// do something
// else
// create a new dataframe df1 with a new column "allowed" with the value set to 1
// push df1 to API
})
df.map(row=>{
// here I want to see if the downloadUrl is null
// do something
// else if the title is null
// do something
// else
// create a new dataframe df1 with a new column "allowed" with the value set to 1
// push df1 to API
})
不确定是什么意思如果title/downloadUrl为空做点什么
但是如果你想要一个只有行 downloadUrl 和 title 不为空的新数据框。尝试使用此数据集方法
case class MyObject(id:Int, downloadUrl: String, title: String)
val df = spark.read.json("C:\filepath\file.json").as[MyObject]
val df1 = df.filter(o => o.downloadUrl =! null && o.title != null)
另一种方法是使用如下过滤函数
val df1 = df.filter(row=>{
val downloadUrl = row.getAs[String]("downloadUrl")
val title = row.getAs[String]("title")
// here I want to see if the downloadUrl is null
// do something
// else if the title is null
// do something
// else
// create a new dataframe df1 with a new column "allowed" with the value set to 1
return title != null && downloadUrl != null
})
Lastly 如果您想将到达行推送到外部 API,请改用 foreach each。然后使用谓词判断该行是否应该被压入
df.foreach(row=>{
val downloadUrl = row.getAs[String]("downloadUrl")
val title = row.getAs[String]("title")
// here I want to see if the downloadUrl is null
// do something
// else if the title is null
// do something
// else
// create a new dataframe df1 with a new column "allowed" with the value set to 1
if (title != null && downloadUrl != null){
//call the API here
}
})
但在这种情况下,我们没有创建新的数据框 - df1
我有一个从 json 文件中读取的数据帧 df:
val df = spark.read.json("C:\filepath\file.json")
其中有以下数据
Id | downloadUrl | title |
---|---|---|
52193 | https://... | Title... |
5441 | https://... | Title... |
5280 | null | null |
5190 | https://... | Title... |
5215 | https://... | Title... |
1245 | https://... | Title... |
339 | null | Editorial |
59 | https://... | Title... |
现在我想创建一个只有行 downloadUrl 和 title 不为空的新数据框或 rdd。
df.map(row=>{
// here I want to see if the downloadUrl is null
// do something
// else if the title is null
// do something
// else
// create a new dataframe df1 with a new column "allowed" with the value set to 1
// push df1 to API
})
df.map(row=>{
// here I want to see if the downloadUrl is null
// do something
// else if the title is null
// do something
// else
// create a new dataframe df1 with a new column "allowed" with the value set to 1
// push df1 to API
})
不确定是什么意思如果title/downloadUrl为空做点什么
但是如果你想要一个只有行 downloadUrl 和 title 不为空的新数据框。尝试使用此数据集方法
case class MyObject(id:Int, downloadUrl: String, title: String)
val df = spark.read.json("C:\filepath\file.json").as[MyObject]
val df1 = df.filter(o => o.downloadUrl =! null && o.title != null)
另一种方法是使用如下过滤函数
val df1 = df.filter(row=>{
val downloadUrl = row.getAs[String]("downloadUrl")
val title = row.getAs[String]("title")
// here I want to see if the downloadUrl is null
// do something
// else if the title is null
// do something
// else
// create a new dataframe df1 with a new column "allowed" with the value set to 1
return title != null && downloadUrl != null
})
Lastly 如果您想将到达行推送到外部 API,请改用 foreach each。然后使用谓词判断该行是否应该被压入
df.foreach(row=>{
val downloadUrl = row.getAs[String]("downloadUrl")
val title = row.getAs[String]("title")
// here I want to see if the downloadUrl is null
// do something
// else if the title is null
// do something
// else
// create a new dataframe df1 with a new column "allowed" with the value set to 1
if (title != null && downloadUrl != null){
//call the API here
}
})
但在这种情况下,我们没有创建新的数据框 - df1