如何 运行 SQL 查询在 Spark Streaming 中异步定义在流数据上的表?

How to run SQL queries on tables defined on streaming data asynchronously in Spark Streaming?

官方 Spark Streaming Programming Guide DataFrame and SQL Operations 部分提到了 运行 SQL 异步查询:

You can also run SQL queries on tables defined on streaming data from a different thread (that is, asynchronous to the running StreamingContext).

是否有 examples/samples 说明如何操作?

文档不得不提到它真的很有趣,因为使用相同 SparkSession.

的任何线程都可以访问任何临时 table 是一个事实。

我会按如下方式处理:

// Create a fixed thread pool to execute asynchronous tasks
val executorService = Executors.newFixedThreadPool(1)
dstream.foreachRDD { rdd =>
  import org.apache.spark.sql._
  val spark = SparkSession.builder.config(rdd.sparkContext.getConf).getOrCreate
  import spark.implicits._
  import spark.sql

  val records = rdd.toDF("record")
  records.createOrReplaceTempView("records")

  // Submit a asynchronous task to execute a SQL query
  executorService.submit {
    new Runnable {
      override def run(): Unit = {
        sql("select * from records").show(truncate = false)
      }
    }
  }
}