在 Spark 数据框列中用引号过滤字符串

filter string with quotes in Spark dataframe column

我有一个包含此数据的 DF:

--------+------------------------------------------+
|recType |value                                     |
+--------+------------------------------------------+
|{"id": 1|{"id": 1, "user_id": 100, "price": 50}    |
...

我可以使用 contains 过滤 recType,但是如何使用 === 和引号?我似乎每次都会遇到一些错误。

我知道这里的列是字符串。如果是这样,from_json 函数可以将它们解析为结构。

import org.apache.spark.sql.types.{StructField, StructType, IntegerType}
import org.apache.spark.sql.functions.from_json

val recTypeSchema = StructType(Array(
    StructField("id", IntegerType, true)
))
val valueSchema = StructType(Array(
    StructField("id", IntegerType, true),
    StructField("user_id", IntegerType, true),
    StructField("price", IntegerType, true)
))

val parsedDf = df
    .withColumn("recType", from_json($"recType", recTypeSchema))
    .withColumn("value", from_json($"value", valueSchema))

parsedDf.printSchema
root
 |-- recType: struct (nullable = true)
 |    |-- id: integer (nullable = true)
 |-- value: struct (nullable = true)
 |    |-- id: integer (nullable = true)
 |    |-- user_id: integer (nullable = true)
 |    |-- price: integer (nullable = true)


parsedDf.filter($"recType.id" === 1).show
+-------+------------+
|recType|       value|
+-------+------------+
|    {1}|{1, 100, 50}|
+-------+------------+