org.apache.spark.sql.AnalysisException:流 DataFrames/Datasets 不支持非基于时间的 windows;;尽管基于时间 window
org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;; despite of time-based window
我正在为 Spark Structured Streaming 进行基于 window 的排序:
val filterWindow: WindowSpec = Window
.partitionBy("key")
.orderBy($"time")
controlDataFrame=controlDataFrame.withColumn("Make Coffee", $"value").
withColumn("datetime", date_trunc("second", current_timestamp())).
withColumn("time", current_timestamp()).
withColumn("temp_rank", rank().over(filterWindow))
.filter(col("temp_rank") === 1)
.drop("temp_rank").
withColumn("digitalTwinId", lit(digitalTwinId)).
withWatermark("datetime", "10 seconds")
我正在获取 time
作为 current_timestamp()
并且在模式中我看到它的类型为 StructField(time,TimestampType,true)
为什么 Spark 3.0 不允许我基于它执行 window 操作,但有以下例外情况,因为提交显然是基于时间的?
21/11/22 10:34:03 WARN SparkSession$Builder: Using an existing SparkSession; some spark core configurations may not take effect.
org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;;
Window [rank(time#163) windowspecdefinition(key#150, time#163 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS temp_rank#171], [key#150], [time#163 ASC NULLS FIRST]
+- Project [key#150, value#151, Make Coffee#154, datetime#158, time#163]
基于时间的 window 不只是 Window/WindowSpec 基于时间戳格式的列。您应该为时间戳列显式使用 session_window()
函数。
我正在为 Spark Structured Streaming 进行基于 window 的排序:
val filterWindow: WindowSpec = Window
.partitionBy("key")
.orderBy($"time")
controlDataFrame=controlDataFrame.withColumn("Make Coffee", $"value").
withColumn("datetime", date_trunc("second", current_timestamp())).
withColumn("time", current_timestamp()).
withColumn("temp_rank", rank().over(filterWindow))
.filter(col("temp_rank") === 1)
.drop("temp_rank").
withColumn("digitalTwinId", lit(digitalTwinId)).
withWatermark("datetime", "10 seconds")
我正在获取 time
作为 current_timestamp()
并且在模式中我看到它的类型为 StructField(time,TimestampType,true)
为什么 Spark 3.0 不允许我基于它执行 window 操作,但有以下例外情况,因为提交显然是基于时间的?
21/11/22 10:34:03 WARN SparkSession$Builder: Using an existing SparkSession; some spark core configurations may not take effect.
org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;;
Window [rank(time#163) windowspecdefinition(key#150, time#163 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS temp_rank#171], [key#150], [time#163 ASC NULLS FIRST]
+- Project [key#150, value#151, Make Coffee#154, datetime#158, time#163]
基于时间的 window 不只是 Window/WindowSpec 基于时间戳格式的列。您应该为时间戳列显式使用 session_window()
函数。