使用 SPARK SQL 删除重复项
Delete Duplicate using SPARK SQL
以下代码在 Databricks Spark 中运行良好SQL
with CTE1 as
(
select *,
row_number()over(Partition by ID order by Name) as r
from Emp
)
select * from CTE1 where r>1
但对于 DELETE
语句:
with CTE1 as
(
select *,
row_number()over(Partition by ID order by Name) as r
from Emp
)
DELETE from CTE1 where r>1
SQL 语句中有一个错误:
Analysis exception: Table Not found Emp
您想要的语法仅在 SQL 服务器中可用。假设 Name
是唯一的而不是 NULL
,您可以使用替代方法,例如:
delete from emp
where name > (select min(emp2.name)
from emp emp2
where emp2.id = emp.id
);
否则,使用table的主键进行比较。
这里有一个简单的方法,不用删除,只需select你想要的
with CTE1 as
(
select *,
row_number()over(Partition by ID order by Name) as r
from Emp
) where r=1
以下代码在 Databricks Spark 中运行良好SQL
with CTE1 as
(
select *,
row_number()over(Partition by ID order by Name) as r
from Emp
)
select * from CTE1 where r>1
但对于 DELETE
语句:
with CTE1 as
(
select *,
row_number()over(Partition by ID order by Name) as r
from Emp
)
DELETE from CTE1 where r>1
SQL 语句中有一个错误:
Analysis exception: Table Not found Emp
您想要的语法仅在 SQL 服务器中可用。假设 Name
是唯一的而不是 NULL
,您可以使用替代方法,例如:
delete from emp
where name > (select min(emp2.name)
from emp emp2
where emp2.id = emp.id
);
否则,使用table的主键进行比较。
这里有一个简单的方法,不用删除,只需select你想要的
with CTE1 as
(
select *,
row_number()over(Partition by ID order by Name) as r
from Emp
) where r=1