JDBC 不在 pyspark 上截断 Postgres table

Question

我正在使用以下代码在插入数据之前截断 table。

df.write \
    .option("driver", "org.postgresql:postgresql:42.2.16") \
    .option("truncate", True) \
    .jdbc(url=pgsql_connection, table="service", mode='append', properties=properties_postgres)

虽然，它不起作用。 table 仍然是旧数据。我正在使用追加，因为我不想每次都删除数据库并创建一个新的 table。

我试过 .option("truncate", "true") 但没用。

我没有收到任何错误消息。我如何使用 .option 截断我的 table.

来解决这个问题

Answer 1

您需要使用overwrite模式

df.write \
    .option("driver", "org.postgresql:postgresql:42.2.16") \
    .option("truncate", True) \
    .jdbc(url=pgsql_connection, table="service", mode='overwrite', properties=properties_postgres)

如文档中所述

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

truncate: true -> When SaveMode.Overwrite is enabled, this option causes Spark to truncate an existing table instead of dropping and recreating it.

JDBC 不在 pyspark 上截断 Postgres table

JDBC not truncating Postgres table on pyspark

jdbc

apache-spark

pyspark