是否可以在 spark 中插入临时 table?
Is it possible to insert into temporary table in spark?
我使用 Databricks 和 Apache Spark 2.4 测试了以下查询:
%sql
<step1>
create temporary view temp_view_t
as select 1 as no, 'aaa' as str;
<step2>
insert into temp_view_t values (2,'bbb');
然后我收到这条错误消息。
Error in SQL statement: AnalysisException: Inserting into an RDD-based table is not allowed.;;
'InsertIntoTable Project [1 AS no#824, aaa AS str#825], false, false
+- LocalRelation [col1#831, col2#832]
我的问题是
- 在spark中无法插入临时table吗?
- 如何在 spark sql 中创建临时数据?
谢谢。
我们can't
将数据插入临时table,但我们可以用union all
(或)[=19=模拟插入]union
(删除重复项)。
Example:
#create temp view
spark.sql("""create or replace temporary view temp_view_t as select 1 as no, 'aaa' as str""")
spark.sql("select * from temp_view_t").show()
#+---+---+
#| no|str|
#+---+---+
#| 1|aaa|
#+---+---+
#union all with the new data
spark.sql("""create or replace temporary view temp_view_t as select * from temp_view_t union all select 2 as no, 'bbb' as str""")
spark.sql("select * from temp_view_t").show()
#+---+---+
#| no|str|
#+---+---+
#| 1|aaa|
#| 2|bbb|
#+---+---+
#to eliminate duplicates we can use union also.
spark.sql("""create or replace temporary view temp_view_t as select * from temp_view_t union select 1 as no, 'aaa' as str""")
spark.sql("select * from temp_view_t").show()
#+---+---+
#| no|str|
#+---+---+
#| 1|aaa|
#| 2|bbb|
#+---+---+
是的,您可以插入临时视图,但它必须基于文件的 df 构建。然后新行将作为单独的文件保存在存储中。
例如
df.read.parquet(path).createOrReplaceTempView('temp')
spark.sql("INSERT INTO temp VALUES (....)")
我不认为建议做 UNION
的答案有效(至少在最近的 Databricks 运行时,8.2 spark 运行时 3.1.1),在执行时检测到递归视图。上面的代码示例给出:
AnalysisException: Recursive view `temp_view_t` detected (cycle: `temp_view_t` -> `temp_view_t`)
这看起来很合乎逻辑。
为什么不使用一些中间临时视图或将视图编写为托管增量 table,这样它将处理 INSERT
操作。
我使用 Databricks 和 Apache Spark 2.4 测试了以下查询:
%sql
<step1>
create temporary view temp_view_t
as select 1 as no, 'aaa' as str;
<step2>
insert into temp_view_t values (2,'bbb');
然后我收到这条错误消息。
Error in SQL statement: AnalysisException: Inserting into an RDD-based table is not allowed.;; 'InsertIntoTable Project [1 AS no#824, aaa AS str#825], false, false +- LocalRelation [col1#831, col2#832]
我的问题是
- 在spark中无法插入临时table吗?
- 如何在 spark sql 中创建临时数据?
谢谢。
我们can't
将数据插入临时table,但我们可以用union all
(或)[=19=模拟插入]union
(删除重复项)。
Example:
#create temp view
spark.sql("""create or replace temporary view temp_view_t as select 1 as no, 'aaa' as str""")
spark.sql("select * from temp_view_t").show()
#+---+---+
#| no|str|
#+---+---+
#| 1|aaa|
#+---+---+
#union all with the new data
spark.sql("""create or replace temporary view temp_view_t as select * from temp_view_t union all select 2 as no, 'bbb' as str""")
spark.sql("select * from temp_view_t").show()
#+---+---+
#| no|str|
#+---+---+
#| 1|aaa|
#| 2|bbb|
#+---+---+
#to eliminate duplicates we can use union also.
spark.sql("""create or replace temporary view temp_view_t as select * from temp_view_t union select 1 as no, 'aaa' as str""")
spark.sql("select * from temp_view_t").show()
#+---+---+
#| no|str|
#+---+---+
#| 1|aaa|
#| 2|bbb|
#+---+---+
是的,您可以插入临时视图,但它必须基于文件的 df 构建。然后新行将作为单独的文件保存在存储中。
例如
df.read.parquet(path).createOrReplaceTempView('temp')
spark.sql("INSERT INTO temp VALUES (....)")
我不认为建议做 UNION
的答案有效(至少在最近的 Databricks 运行时,8.2 spark 运行时 3.1.1),在执行时检测到递归视图。上面的代码示例给出:
AnalysisException: Recursive view `temp_view_t` detected (cycle: `temp_view_t` -> `temp_view_t`)
这看起来很合乎逻辑。
为什么不使用一些中间临时视图或将视图编写为托管增量 table,这样它将处理 INSERT
操作。