基于另一列在数据框中添加一列
Adding a column in dataframe based on another column
添加一个基于布尔列的字符串列数组
对 Spark 2.4+ 使用 exists
函数:
val df = customerDocument.withColumn(
"flag",
expr("exists(address, x -> x rlike 'test string')")
)
对于旧版本,您可以将数组转换为字符串,然后使用 rlike
:
val df = customerDocument.withColumn(
"flag",
concat_ws(",", col("address")).rlike("test string")
)
示例:
val df = Seq(
(Seq("ADR249,IND0100,300", "Purcell Road", "Road Town", "British Virgin Islands,300,Purcell Road,Road Town,British Virgin Islands")),
(Seq("ADR500,IND0268,425", "High Street", "Sydney", "Australia,425,High Street,Sydney,Australia"))
).toDF("address")
df.withColumn(
"flag",
concat_ws(",", col("address")).rlike("British Virgin Islands")
).show(false)
//+-----------------------------------------------------------------------------------------------------------------------+-----+
//|address |flag |
//+-----------------------------------------------------------------------------------------------------------------------+-----+
//|[ADR249,IND0100,300, Purcell Road, Road Town, British Virgin Islands,300,Purcell Road,Road Town,British Virgin Islands]|true |
//|[ADR500,IND0268,425, High Street, Sydney, Australia,425,High Street,Sydney,Australia] |false|
//+-----------------------------------------------------------------------------------------------------------------------+-----+
编辑
对于您特定的 spark 版本 (<2.1),您不能使用 concat_ws
将数组转换为字符串。您需要像这样使用 DataFrame.map
:
df.map(r => {
val address = r.getList(0).toArray.mkString(",")
(address)
}).toDF("address").withColumn(
"flag",
col("address").rlike("British Virgin Islands")
).show(false)
添加一个基于布尔列的字符串列数组
对 Spark 2.4+ 使用 exists
函数:
val df = customerDocument.withColumn(
"flag",
expr("exists(address, x -> x rlike 'test string')")
)
对于旧版本,您可以将数组转换为字符串,然后使用 rlike
:
val df = customerDocument.withColumn(
"flag",
concat_ws(",", col("address")).rlike("test string")
)
示例:
val df = Seq(
(Seq("ADR249,IND0100,300", "Purcell Road", "Road Town", "British Virgin Islands,300,Purcell Road,Road Town,British Virgin Islands")),
(Seq("ADR500,IND0268,425", "High Street", "Sydney", "Australia,425,High Street,Sydney,Australia"))
).toDF("address")
df.withColumn(
"flag",
concat_ws(",", col("address")).rlike("British Virgin Islands")
).show(false)
//+-----------------------------------------------------------------------------------------------------------------------+-----+
//|address |flag |
//+-----------------------------------------------------------------------------------------------------------------------+-----+
//|[ADR249,IND0100,300, Purcell Road, Road Town, British Virgin Islands,300,Purcell Road,Road Town,British Virgin Islands]|true |
//|[ADR500,IND0268,425, High Street, Sydney, Australia,425,High Street,Sydney,Australia] |false|
//+-----------------------------------------------------------------------------------------------------------------------+-----+
编辑
对于您特定的 spark 版本 (<2.1),您不能使用 concat_ws
将数组转换为字符串。您需要像这样使用 DataFrame.map
:
df.map(r => {
val address = r.getList(0).toArray.mkString(",")
(address)
}).toDF("address").withColumn(
"flag",
col("address").rlike("British Virgin Islands")
).show(false)