将时间范围与 spark sql 一起使用时添加带大小写的标签?
add label with case when by using timerange with spark sql?
我有这个 table,其中显示 ID 和时间戳。我想为每个时间戳范围添加标签。
ID timestamp
a 2020-01-16 08:55:50
b 2020-01-16 08:57:37
c 2020-01-16 09:00:13
d 2020-01-16 09:01:32
e 2020-01-16 09:03:32
f 2020-01-16 09:06:56
比如从2020-01-1608:55:50到2020-01-1609:00:13是X,从2020-01-1609:01:32到2020-01- 1609:06:56是Y.
我希望 table 会显示:
ID timestamp type_flag
a 2020-01-16 08:55:50 X
b 2020-01-16 08:57:37 X
c 2020-01-16 09:00:13 X
d 2020-01-16 09:01:32 Y
e 2020-01-16 09:03:32 Y
f 2020-01-16 09:06:56 Y
g 2020-01-16 09:08:51 Z
h 2020-01-16 09:10:43 Z
i 2020-01-16 09:13:21 Z
到目前为止,我尝试过的:
SELECT *,
CASE WHEN timestamp BETWEEN '2020-01-16 08:55:50' AND '2020-01-16 09:00:13' THEN 'X'
WHEN timestamp BETWEEN '2020-01-16 09:01:32' and '2020-01-16 09:06:56' THEN 'Y'
WHEN timestamp BETWEEN '2020-01-16 09:08:51' and '2020-01-16 09:13:21' THEN 'Z'
ELSE 'A' END AS type_flag
FROM table1;
但它给了我一个错误说:
Error [22P02]: ERROR: invalid input syntax for integer: "2021-01-16 08:55:50"
Position: 37
我应该如何修正我的查询以获得我想要的结果?我为此使用 spark sql
谢谢。
我认为您的语法或转换方式有问题。
//creating sample data
val df = Seq(("a","2020-01-16 08:55:50"),("b","2020-01-16 08:57:37"),("c","2020-01-16 09:00:13"),("d","2020-01-16 09:01:32"),("e","2020-01-16 09:03:32"),("f","2020-01-16 09:06:56")).toDF("ID","timestamp")
//changing the data type of the timestamp column from string to timestamp
import org.apache.spark.sql.types._
val df1 = df.withColumn("timestamp",$"timestamp".cast("TimeStamp"))
//creating a view so that I can query it using spark sql
df1.createOrReplaceTempView("timestamptest")
//case when statements inside the spark sql
val df3 = spark.sql("""select *, CASE WHEN timestamp BETWEEN '2020-01-16 08:55:50' AND '2020-01-16 09:00:13' THEN 'X'
WHEN timestamp BETWEEN '2020-01-16 09:01:32' and '2020-01-16 09:06:56' THEN 'Y'
WHEN timestamp BETWEEN '2020-01-16 09:08:51' and '2020-01-16 09:13:21' THEN 'Z'
ELSE 'A' END As type_flag from timestamptest""")
display(df3)
您可以看到如下输出:
我有这个 table,其中显示 ID 和时间戳。我想为每个时间戳范围添加标签。
ID timestamp
a 2020-01-16 08:55:50
b 2020-01-16 08:57:37
c 2020-01-16 09:00:13
d 2020-01-16 09:01:32
e 2020-01-16 09:03:32
f 2020-01-16 09:06:56
比如从2020-01-1608:55:50到2020-01-1609:00:13是X,从2020-01-1609:01:32到2020-01- 1609:06:56是Y.
我希望 table 会显示:
ID timestamp type_flag
a 2020-01-16 08:55:50 X
b 2020-01-16 08:57:37 X
c 2020-01-16 09:00:13 X
d 2020-01-16 09:01:32 Y
e 2020-01-16 09:03:32 Y
f 2020-01-16 09:06:56 Y
g 2020-01-16 09:08:51 Z
h 2020-01-16 09:10:43 Z
i 2020-01-16 09:13:21 Z
到目前为止,我尝试过的:
SELECT *,
CASE WHEN timestamp BETWEEN '2020-01-16 08:55:50' AND '2020-01-16 09:00:13' THEN 'X'
WHEN timestamp BETWEEN '2020-01-16 09:01:32' and '2020-01-16 09:06:56' THEN 'Y'
WHEN timestamp BETWEEN '2020-01-16 09:08:51' and '2020-01-16 09:13:21' THEN 'Z'
ELSE 'A' END AS type_flag
FROM table1;
但它给了我一个错误说:
Error [22P02]: ERROR: invalid input syntax for integer: "2021-01-16 08:55:50"
Position: 37
我应该如何修正我的查询以获得我想要的结果?我为此使用 spark sql
谢谢。
我认为您的语法或转换方式有问题。
//creating sample data
val df = Seq(("a","2020-01-16 08:55:50"),("b","2020-01-16 08:57:37"),("c","2020-01-16 09:00:13"),("d","2020-01-16 09:01:32"),("e","2020-01-16 09:03:32"),("f","2020-01-16 09:06:56")).toDF("ID","timestamp")
//changing the data type of the timestamp column from string to timestamp
import org.apache.spark.sql.types._
val df1 = df.withColumn("timestamp",$"timestamp".cast("TimeStamp"))
//creating a view so that I can query it using spark sql
df1.createOrReplaceTempView("timestamptest")
//case when statements inside the spark sql
val df3 = spark.sql("""select *, CASE WHEN timestamp BETWEEN '2020-01-16 08:55:50' AND '2020-01-16 09:00:13' THEN 'X'
WHEN timestamp BETWEEN '2020-01-16 09:01:32' and '2020-01-16 09:06:56' THEN 'Y'
WHEN timestamp BETWEEN '2020-01-16 09:08:51' and '2020-01-16 09:13:21' THEN 'Z'
ELSE 'A' END As type_flag from timestamptest""")
display(df3)
您可以看到如下输出: