如何用Presto/Trino物理删除数据?
How to delete data physically with Presto/Trino?
在我安装的 Presto (358) 中,我有两个工作蜂巢连接器:
- S3
- Azure blob (ABFS)
一切正常,但当我调用 DROP (TABLE/SCHEMA)
或 DELETE FROM
时,删除仅发生在 Metastore 中,没有数据被物理删除。适用于 S3 和 ABFS。
在替换数据的情况下,这会变得很成问题:
> DROP TABLE hive.abc;
-- ok
> CREATE TABLE hive.abc AS (...)
-- ERROR: Target directory 'abc' already exists.
删除分区等同理
有没有办法真正删除数据?
找到解决方案。主要区别在于为架构及其 tables.
指定 external_location 与 location
CREATE SCHEMA hive.xyz WITH (location = 'abfs://...');
CREATE TABLE hive.xyz.test AS SELECT (...);
DELETE FROM hive.xyz.test WHERE TRUE;
-- Data ARE physically deleted
CREATE SCHEMA hive.xyz;
CREATE TABLE hive.xyz.test
WITH (external_location = 'abfs://...')
AS SELECT (...);
DELETE FROM hive.xyz.test WHERE TRUE;
-- Data ARE NOT physically deleted.
结论:external_location
table 将防止数据删除。
在我安装的 Presto (358) 中,我有两个工作蜂巢连接器:
- S3
- Azure blob (ABFS)
一切正常,但当我调用 DROP (TABLE/SCHEMA)
或 DELETE FROM
时,删除仅发生在 Metastore 中,没有数据被物理删除。适用于 S3 和 ABFS。
在替换数据的情况下,这会变得很成问题:
> DROP TABLE hive.abc;
-- ok
> CREATE TABLE hive.abc AS (...)
-- ERROR: Target directory 'abc' already exists.
删除分区等同理
有没有办法真正删除数据?
找到解决方案。主要区别在于为架构及其 tables.
指定 external_location 与 locationCREATE SCHEMA hive.xyz WITH (location = 'abfs://...');
CREATE TABLE hive.xyz.test AS SELECT (...);
DELETE FROM hive.xyz.test WHERE TRUE;
-- Data ARE physically deleted
CREATE SCHEMA hive.xyz;
CREATE TABLE hive.xyz.test
WITH (external_location = 'abfs://...')
AS SELECT (...);
DELETE FROM hive.xyz.test WHERE TRUE;
-- Data ARE NOT physically deleted.
结论:external_location
table 将防止数据删除。