从 InfluxDB 测量中删除具有不需要的字段值的点
Delete points with unwanted field values from InfluxDB measurement
InfluxDB 让您 delete points 基于 WHERE tag='value'
条件,而不是字段值。
例如,如果您不小心在一系列正浮点数中存储了值为 -1 的测量值(例如 CPU 利用率),DELETE FROM metrics WHERE cpu=-1
将 return 此错误:
fields not supported in WHERE clause during deletion
这在 InfluxDB 中仍然是 (2015 - 2020) 不可能的 - 请参阅 ticket 3210。
您可以通过在测量中插入具有相同 timestamp and tag set:
的点来用其他值覆盖该点
A point is uniquely identified by the measurement name, tag set, and timestamp. If you submit a new point with the same measurement, tag set, and timestamp as an existing point, the field set becomes the union of the old field set and the new field set, where any ties go to the new field set. This is the intended behavior.
由于您 not supposed to insert nulls,您可能想要重复前一点的值。
您可能会考虑插入一个具有相同时间戳的点,并为其中一个标签设置唯一值,然后 运行 删除该标签:
DELETE FROM measurement WHERE some_existing_tag='deleteme'
但这行不通。当您插入第二个 deleteme
点时,由于 deleteme
标记,它具有不同的标记集,因此 InfluxDB 将为它创建一个新点。然后DELETE
命令会删除它,但不会删除你想删除的原点
昂贵的方法
没有时间范围
# Copy all valid data to a temporary measurement
SELECT * INTO metrics_clean FROM metrics WHERE cpu!=-1 GROUP BY *
# Drop existing dirty measurement
DROP measurement metrics
# Copy temporary measurement to existing measurement
SELECT * INTO metrics FROM metrics_clean GROUP BY *
有时间范围
# Copy all valid data to a temporary measurement within timerange
SELECT * INTO metrics_clean FROM metrics WHERE cpu!=-1 and time > '<start_time>' and time '<end_time>' GROUP BY *;
# Delete existing dirty data within timerange
DELETE FROM metrics WHERE time > '<start_time>' and time '<end_time>';
# Copy temporary measurement to existing measurement
SELECT * INTO metrics FROM metrics_clean GROUP BY *
丑陋而缓慢但相当稳健的解决方案:存储时间戳,然后按时间戳删除条目,可选择使用附加标签过滤 DELETE
语句。
N.B. 这仅在字段具有唯一时间戳时有效!例如。如果一个时间戳有多个字段,则使用以下命令删除所有这些字段。使用 epoch=ns
实际上可以缓解这种情况,除非你有 ~billion 数据 points/second
curl -G 'http://localhost:8086/query?db=DATABASE&epoch=ns' \
--data-urlencode "q=SELECT * FROM metrics WHERE cpu=-1" |\
jq -r "(.results[0].series[0].values[][0])" > delete_timestamps.txt
for i in $(cat delete_timestamps.txt); do
echo $i;
curl -G 'http://localhost:8086/query?db=DATABASE&epoch=ns' \
--data-urlencode "q=DELETE FROM metrics WHERE time=$i AND cpu=-1";
done
InfluxDB 让您 delete points 基于 WHERE tag='value'
条件,而不是字段值。
例如,如果您不小心在一系列正浮点数中存储了值为 -1 的测量值(例如 CPU 利用率),DELETE FROM metrics WHERE cpu=-1
将 return 此错误:
fields not supported in WHERE clause during deletion
这在 InfluxDB 中仍然是 (2015 - 2020) 不可能的 - 请参阅 ticket 3210。
您可以通过在测量中插入具有相同 timestamp and tag set:
的点来用其他值覆盖该点A point is uniquely identified by the measurement name, tag set, and timestamp. If you submit a new point with the same measurement, tag set, and timestamp as an existing point, the field set becomes the union of the old field set and the new field set, where any ties go to the new field set. This is the intended behavior.
由于您 not supposed to insert nulls,您可能想要重复前一点的值。
您可能会考虑插入一个具有相同时间戳的点,并为其中一个标签设置唯一值,然后 运行 删除该标签:
DELETE FROM measurement WHERE some_existing_tag='deleteme'
但这行不通。当您插入第二个 deleteme
点时,由于 deleteme
标记,它具有不同的标记集,因此 InfluxDB 将为它创建一个新点。然后DELETE
命令会删除它,但不会删除你想删除的原点
昂贵的方法
没有时间范围
# Copy all valid data to a temporary measurement
SELECT * INTO metrics_clean FROM metrics WHERE cpu!=-1 GROUP BY *
# Drop existing dirty measurement
DROP measurement metrics
# Copy temporary measurement to existing measurement
SELECT * INTO metrics FROM metrics_clean GROUP BY *
有时间范围
# Copy all valid data to a temporary measurement within timerange
SELECT * INTO metrics_clean FROM metrics WHERE cpu!=-1 and time > '<start_time>' and time '<end_time>' GROUP BY *;
# Delete existing dirty data within timerange
DELETE FROM metrics WHERE time > '<start_time>' and time '<end_time>';
# Copy temporary measurement to existing measurement
SELECT * INTO metrics FROM metrics_clean GROUP BY *
丑陋而缓慢但相当稳健的解决方案:存储时间戳,然后按时间戳删除条目,可选择使用附加标签过滤 DELETE
语句。
N.B. 这仅在字段具有唯一时间戳时有效!例如。如果一个时间戳有多个字段,则使用以下命令删除所有这些字段。使用 epoch=ns
实际上可以缓解这种情况,除非你有 ~billion 数据 points/second
curl -G 'http://localhost:8086/query?db=DATABASE&epoch=ns' \
--data-urlencode "q=SELECT * FROM metrics WHERE cpu=-1" |\
jq -r "(.results[0].series[0].values[][0])" > delete_timestamps.txt
for i in $(cat delete_timestamps.txt); do
echo $i;
curl -G 'http://localhost:8086/query?db=DATABASE&epoch=ns' \
--data-urlencode "q=DELETE FROM metrics WHERE time=$i AND cpu=-1";
done