是否可以在 spark 中插入临时 table?

Is it possible to insert into temporary table in spark?

我使用 Databricks 和 Apache Spark 2.4 测试了以下查询:

%sql

<step1>
create temporary view temp_view_t
as select 1 as no, 'aaa' as str;

<step2>
insert into temp_view_t values (2,'bbb');

然后我收到这条错误消息。

Error in SQL statement: AnalysisException: Inserting into an RDD-based table is not allowed.;; 'InsertIntoTable Project [1 AS no#824, aaa AS str#825], false, false +- LocalRelation [col1#831, col2#832]

我的问题是

  1. 在spark中无法插入临时table吗?
  2. 如何在 spark sql 中创建临时数据?

谢谢。

我们can't将数据插入临时table,但我们可以用union all(或)[=19=模拟插入]union(删除重复项)。

Example:

#create temp view
spark.sql("""create or replace temporary view temp_view_t as select 1 as no, 'aaa' as str""")

spark.sql("select * from temp_view_t").show()
#+---+---+
#| no|str|
#+---+---+
#|  1|aaa|
#+---+---+

#union all with the new data
spark.sql("""create or replace temporary view temp_view_t as select * from temp_view_t union all select 2 as no, 'bbb' as str""")

spark.sql("select * from temp_view_t").show()                                                                     
#+---+---+
#| no|str|
#+---+---+
#|  1|aaa|
#|  2|bbb|
#+---+---+

#to eliminate duplicates we can use union also. 
spark.sql("""create or replace temporary view temp_view_t as select * from temp_view_t union select 1 as no, 'aaa' as str""")

spark.sql("select * from temp_view_t").show()
#+---+---+
#| no|str|
#+---+---+
#|  1|aaa|
#|  2|bbb|
#+---+---+

是的,您可以插入临时视图,但它必须基于文件的 df 构建。然后新行将作为单独的文件保存在存储中。

例如

df.read.parquet(path).createOrReplaceTempView('temp')

spark.sql("INSERT INTO temp VALUES (....)")

我不认为建议做 UNION 的答案有效(至少在最近的 Databricks 运行时,8.2 spark 运行时 3.1.1),在执行时检测到递归视图。上面的代码示例给出:

AnalysisException: Recursive view `temp_view_t` detected (cycle: `temp_view_t` -> `temp_view_t`)

这看起来很合乎逻辑。 为什么不使用一些中间临时视图或将视图编写为托管增量 table,这样它将处理 INSERT 操作。