如何检查spark中一行中的值是否为空

How to check if a value in a row is empty in spark

我有一个从 json 文件中读取的数据帧 df:

val df = spark.read.json("C:\filepath\file.json")其中有以下数据

Id downloadUrl title
52193 https://... Title...
5441 https://... Title...
5280 null null
5190 https://... Title...
5215 https://... Title...
1245 https://... Title...
339 null Editorial
59 https://... Title...

现在我想创建一个只有行 downloadUrl 和 title 不为空的新数据框或 rdd。

  df.map(row=>{
    // here I want to see if the downloadUrl is null
    // do something

    // else if the title is null
    // do something

    // else
    // create a new dataframe df1 with a new column "allowed" with the value set to 1 
    // push df1 to API

  })
  df.map(row=>{
    // here I want to see if the downloadUrl is null
    // do something

    // else if the title is null
    // do something

    // else
    // create a new dataframe df1 with a new column "allowed" with the value set to 1 
    // push df1 to API
  })

不确定是什么意思如果title/downloadUrl为空做点什么

但是如果你想要一个只有行 downloadUrl 和 title 不为空的新数据框。尝试使用此数据集方法

case class MyObject(id:Int, downloadUrl: String, title: String)
val df = spark.read.json("C:\filepath\file.json").as[MyObject]
val df1 = df.filter(o => o.downloadUrl =! null && o.title != null)

另一种方法是使用如下过滤函数

val df1 = df.filter(row=>{
    val downloadUrl = row.getAs[String]("downloadUrl")
    val title = row.getAs[String]("title")
    // here I want to see if the downloadUrl is null
    // do something

    // else if the title is null
    // do something

    // else
    // create a new dataframe df1 with a new column "allowed" with the value set to 1 
    return title != null && downloadUrl != null
  })

Lastly 如果您想将到达行推送到外部 API,请改用 foreach each。然后使用谓词判断该行是否应该被压入

  df.foreach(row=>{
    val downloadUrl = row.getAs[String]("downloadUrl")
    val title = row.getAs[String]("title")
    // here I want to see if the downloadUrl is null
    // do something

    // else if the title is null
    // do something

    // else
    // create a new dataframe df1 with a new column "allowed" with the value set to 1 
    if (title != null && downloadUrl != null){
        //call the API here
    }
  })

但在这种情况下,我们没有创建新的数据框 - df1