将小时映射到天间隔 - Spark Scala
Map hours to day intervals - Spark Scala
我有一个包含 string 小时列的数据框:
+-------+
|DepTime|
+-------+
| 13:43|
| 11:25|
| 20:09|
| 09:03|
| 14:23|
| 20:24|
| 17:53|
| 06:22|
| 19:44|
| 14:53|
+-------+
考虑到这个间隔,我想转换该列:
From 06:00 to 11:59 -> Morning
From 12:00 to 17:00 -> Afternoon
From 17:01 to 20:00 -> Evening
From 20:01 to 05:59 -> Night
预期输出:
+------------+
|DepTime |
+------------+
| Afternoon|
| Morning|
| Night|
| Morning|
| Afternoon|
| Night|
| Evening|
| Morning|
| Evening|
| Afternoon|
+------------+
我使用 rlike
和 lit
:
等函数完成了类似的字符串转换
df = df.withColumn("DayOfWeek",
when(col("DayOfWeek").rlike("1"),lit("Monday"))
.when(col("DayOfWeek").rlike("2"),lit("Tuesday"))
.when(col("DayOfWeek").rlike("3"),lit("Wednesday"))
.when(col("DayOfWeek").rlike("4"),lit ("Thursday"))
.when(col("DayOfWeek").rlike("5"),lit("Friday"))
.when(col("DayOfWeek").rlike("6"),lit("Saturday"))
.when(col("DayOfWeek").rlike("7"),lit("Sunday"))
)
对于这种情况,我正在考虑使用 if
(可能使用 <
和 >
运算符)和 otherwise
但我不知道如何形成组(范围)因为小时有特殊顺序。
感谢任何帮助。提前致谢。
试试这个:
data
.withColumn("Time", date_format(col("DepTime"), "HH:mm"))
.withColumn("PeriodOfTime",
when(col("Time") > "06:00" && col("Time") <= "12:00", "Morning")
.when(col("Time") > "12:00" && col("Time") <= "17:00", "Afternoon")
.when(col("Time") > "17:00" && col("Time") <= "20:00", "Evening")
.otherwise("Night"))
.drop("Time")
输出(测试):
+-------+------------+
|DepTime|PeriodOfTime|
+-------+------------+
| 13:43| Afternoon|
| 11:25| Morning|
| 20:09| Night|
| 09:03| Morning|
| 14:23| Afternoon|
| 20:24| Night|
| 17:53| Evening|
| 06:22| Morning|
| 19:44| Evening|
| 14:53| Afternoon|
+-------+------------+
我有一个包含 string 小时列的数据框:
+-------+
|DepTime|
+-------+
| 13:43|
| 11:25|
| 20:09|
| 09:03|
| 14:23|
| 20:24|
| 17:53|
| 06:22|
| 19:44|
| 14:53|
+-------+
考虑到这个间隔,我想转换该列:
From 06:00 to 11:59 -> Morning
From 12:00 to 17:00 -> Afternoon
From 17:01 to 20:00 -> Evening
From 20:01 to 05:59 -> Night
预期输出:
+------------+
|DepTime |
+------------+
| Afternoon|
| Morning|
| Night|
| Morning|
| Afternoon|
| Night|
| Evening|
| Morning|
| Evening|
| Afternoon|
+------------+
我使用 rlike
和 lit
:
df = df.withColumn("DayOfWeek",
when(col("DayOfWeek").rlike("1"),lit("Monday"))
.when(col("DayOfWeek").rlike("2"),lit("Tuesday"))
.when(col("DayOfWeek").rlike("3"),lit("Wednesday"))
.when(col("DayOfWeek").rlike("4"),lit ("Thursday"))
.when(col("DayOfWeek").rlike("5"),lit("Friday"))
.when(col("DayOfWeek").rlike("6"),lit("Saturday"))
.when(col("DayOfWeek").rlike("7"),lit("Sunday"))
)
对于这种情况,我正在考虑使用 if
(可能使用 <
和 >
运算符)和 otherwise
但我不知道如何形成组(范围)因为小时有特殊顺序。
感谢任何帮助。提前致谢。
试试这个:
data
.withColumn("Time", date_format(col("DepTime"), "HH:mm"))
.withColumn("PeriodOfTime",
when(col("Time") > "06:00" && col("Time") <= "12:00", "Morning")
.when(col("Time") > "12:00" && col("Time") <= "17:00", "Afternoon")
.when(col("Time") > "17:00" && col("Time") <= "20:00", "Evening")
.otherwise("Night"))
.drop("Time")
输出(测试):
+-------+------------+
|DepTime|PeriodOfTime|
+-------+------------+
| 13:43| Afternoon|
| 11:25| Morning|
| 20:09| Night|
| 09:03| Morning|
| 14:23| Afternoon|
| 20:24| Night|
| 17:53| Evening|
| 06:22| Morning|
| 19:44| Evening|
| 14:53| Afternoon|
+-------+------------+