将时间范围与 spark sql 一起使用时添加带大小写的标签?

add label with case when by using timerange with spark sql?

我有这个 table,其中显示 ID 和时间戳。我想为每个时间戳范围添加标签。

ID          timestamp
a       2020-01-16 08:55:50
b       2020-01-16 08:57:37
c       2020-01-16 09:00:13
d       2020-01-16 09:01:32
e       2020-01-16 09:03:32
f       2020-01-16 09:06:56

比如从2020-01-1608:55:50到2020-01-1609:00:13是X,从2020-01-1609:01:32到2020-01- 1609:06:56是Y.

我希望 table 会显示:

ID        timestamp                type_flag
a       2020-01-16 08:55:50          X
b       2020-01-16 08:57:37          X
c       2020-01-16 09:00:13          X
d       2020-01-16 09:01:32          Y
e       2020-01-16 09:03:32          Y
f       2020-01-16 09:06:56          Y
g       2020-01-16 09:08:51          Z
h       2020-01-16 09:10:43          Z
i       2020-01-16 09:13:21          Z

到目前为止,我尝试过的:

SELECT *,
    CASE WHEN timestamp BETWEEN '2020-01-16 08:55:50' AND '2020-01-16 09:00:13' THEN 'X' 
         WHEN timestamp BETWEEN '2020-01-16 09:01:32' and '2020-01-16 09:06:56' THEN 'Y'
         WHEN timestamp BETWEEN '2020-01-16 09:08:51' and '2020-01-16 09:13:21' THEN 'Z'
    ELSE 'A' END AS type_flag
FROM table1;

但它给了我一个错误说:

Error [22P02]: ERROR: invalid input syntax for integer: "2021-01-16 08:55:50"
  Position: 37

我应该如何修正我的查询以获得我想要的结果?我为此使用 spark sql

谢谢。

我认为您的语法或转换方式有问题。

//creating sample data
val df = Seq(("a","2020-01-16 08:55:50"),("b","2020-01-16 08:57:37"),("c","2020-01-16 09:00:13"),("d","2020-01-16 09:01:32"),("e","2020-01-16 09:03:32"),("f","2020-01-16 09:06:56")).toDF("ID","timestamp")
//changing the data type of the timestamp column from string to timestamp
import org.apache.spark.sql.types._
val df1 = df.withColumn("timestamp",$"timestamp".cast("TimeStamp"))
//creating a view so that I can query it using spark sql
df1.createOrReplaceTempView("timestamptest")
//case when statements inside the spark sql
val df3 = spark.sql("""select *, CASE WHEN timestamp BETWEEN '2020-01-16 08:55:50' AND '2020-01-16 09:00:13' THEN 'X' 
         WHEN timestamp BETWEEN '2020-01-16 09:01:32' and '2020-01-16 09:06:56' THEN 'Y'
         WHEN timestamp BETWEEN '2020-01-16 09:08:51' and '2020-01-16 09:13:21' THEN 'Z'
    ELSE 'A' END As type_flag from timestamptest""")
display(df3)

您可以看到如下输出: