spark.write.synapsesql 个选项与 Azure Synapse Spark 池

Question

在 Azure Synapse 中，我在 Spark Pool 笔记本中使用 Scala 语言的 synapsesql 函数将数据帧的内容推送到 SQL 池

// Write data frame to sql table
df2.write.
option(Constants.SERVER,s"${pServerName}.sql.azuresynapse.net").
synapsesql(s"${pDatabaseName}.xtr.${pTableName}",Constants.INTERNAL)

这很成功，但是我想添加一些额外的功能

如何将索引类型指定为 HEAP 而不是 Clustered Column store Index。在 Data Bricks 中，它可以通过 .option("tableOptions","heap,distribution=ROUND-ROBIN") 完成，但这在 Spark Pool notebook 中不起作用。
如果 SQL 池中的 table 已经存在，我该如何覆盖它？在 Data Bricks 中你有 SaveAsTable 但我在 Spark pool notebook 中找不到类似的东西。

Answer 1

无论如何，按照这个official document，

When a table is created, by default the data structure has no indexes and is called a heap.

要使用 HEAP 类型索引，在创建 table 时只需在 WITH 子句中指定 HEAP：

CREATE TABLE myTable
  (  
    id int NOT NULL,  
    lastName varchar(20),  
    zipCode varchar(6)  
  )  
WITH ( HEAP );

要覆盖，请在您的 SQL 笔记本中导入 org.apache.spark.sql.SaveMode Class 并使用 Overwrite 模式。

Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame.

参考DataFrame write SaveMode support了解更多。

另请查看此 given example 供您参考。

spark.write.synapsesql 个选项与 Azure Synapse Spark 池

spark.write.synapsesql options with Azure Synapse Spark Pool

scala

apache-spark

azure-databricks

azure-synapse