在 java 中使用 Apache Spark Stream 从报价数据创建蜡烛数据

Create candle data from tick data using Apache Spark Stream in java

我们正在获取正在流式传输到 Apache Spark 的关于 Kafka 的滴答数据。我们需要从该流数据创建蜡烛数据。

我想创建数据框的第一个选项,然后从那里 运行 sql 查询

SELECT t1.price AS open,
       m.high,
       m.low,
       t2.price as close,
       open_time
FROM (SELECT MIN(timeInMilliseconds) AS min_time,
             MAX(timeInMilliseconds) AS max_time,
             MIN(price) as low,
             MAX(price) as high,
             FLOOR(timeInMilliseconds/(1000*60)) as open_time
      FROM ticks
      GROUP BY open_time) m
JOIN ticks t1 ON t1.timeInMilliseconds = min_time
JOIN ticks t2 ON t2.timeInMilliseconds = max_time

但我不确定是否能够获取旧报价的数据

是否可以使用Spark库的一些方法来创建类似这样的?

请查看Window事件时间操作https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#window-operations-on-event-time 这正是您所需要的。这是一个scatch代码

val windowedCounts = tickStream.groupBy(
   window($"timeInMilliseconds", "1 minutes"))
 ).agg(
    first("price").alias("open"),
    min("price").alias("min"),
    max ("price").alias("max"),
    last("price").alias("close"))