删除时间序列中的重复项

Deleting duplicates in a time series

我在 SQL Server 2012 table 中存储了大量每 1 毫秒进行一次的测量。每当某些行中有 3 个或更多重复值时,我想删除中间的重复项。此示例数据图像中突出显示的值是我要删除的值。有没有办法用 SQL 查询来做到这一点?

 select * from table group by table.field ->value

我敢肯定一定有更有效的方法来做到这一点,但您可以将 table 加入自身两次以查找列表中的上一个和下一个值,然后删除所有所有三个值都相同的条目。

DELETE FROM tbl
WHERE ms IN
(
  SELECT T.ms
  FROM tbl T
  INNER JOIN tbl T1 ON T.ms = T1.ms + 1
  INNER JOIN tbl T2 ON T.ms = T2.ms - 1
  WHERE T.value = T1.value AND T.value = T2.value
)

如果 table 真的很大,我可以看到这个 blowing tempdb。

您可以使用 CTEROW_NUMBER:

SQL Fiddle

WITH CteGroup AS(
    SELECT *,
        grp = ROW_NUMBER() OVER(ORDER BY MS) - ROW_NUMBER() OVER(PARTITION BY Value ORDER BY MS)
    FROM YourTable
),
CteFinal AS(
    SELECT *,
        RN_FIRST = ROW_NUMBER() OVER(PARTITION BY grp, Value ORDER BY MS),
        RN_LAST  = ROW_NUMBER() OVER(PARTITION BY grp, Value ORDER BY MS DESC)
    FROM CteGroup
)
DELETE 
FROM CteFinal 
WHERE
    RN_FIRST > 1
    AND RN_LAST > 1