如果在 1 秒内重试成功,则分配标志
Assign flag if successful retry within 1 second
我正在尝试根据 1 秒内的成功重试为数据中的每个条目分配标志。这是示例数据
date id event url code event_ts
2021-08-20 11 1629515037.0682 xyz 503 2021-08-20 20:03:57.068
2021-08-20 11 1629515037.1073 xyz 200 2021-08-20 20:03:57.107 -- successful retry within 1 sec
2021-08-20 12 1629515037.1866 abc 503 2021-08-20 20:03:57.187
2021-08-20 12 1629515037.1942 abc 503 2021-08-20 20:03:57.194
2021-08-20 12 1629515037.2037 abc 503 2021-08-20 20:03:57.204
2021-08-20 12 1629515037.2249 abc 503 2021-08-20 20:03:57.225
2021-08-20 12 1629515064.2427 abc 200 2021-08-20 20:04:24.243 -- successful retry within 1 sec
我想创建一个新列重试,
if code = 503, successful retry within 1 sec -> successful_retry
if code = 503, successful retry after 1 sec -> successful_retry_after_1_sec
if code = 503, no successful retry at all -> no_successful_retry
我主要是 Python/Pandas 人,但需要立即解决这个问题。我尝试使用 LEAD() 但无法编写具有可变偏移量的解决方案。多谢指点
编辑:基于@Gordon Linoff 的回答
SELECT
date,
id,
url,
event,
FROM_UNIXTIME(event) AS event_ts,
code,
(
CASE
WHEN code <= 399 THEN 'successful_response'
WHEN MIN(CASE WHEN code <= 399 THEN FROM_UNIXTIME(event) END) OVER (
PARTITION BY
date,
id,
url
ORDER BY
date,
id,
url,
event rows BETWEEN CURRENT ROW AND UNBOUNDED following
) <= FROM_UNIXTIME(event) + INTERVAL '1' SECOND THEN 'success_retry_within_1_sec'
WHEN MIN(CASE WHEN code <= 399 THEN FROM_UNIXTIME(event) END) OVER (
PARTITION BY
date,
id,
url
ORDER BY
date,
id,
url,
event rows BETWEEN CURRENT ROW AND UNBOUNDED following
) > FROM_UNIXTIME(event) + INTERVAL '1' SECOND THEN 'success_retry_after_1_sec'
ELSE 'No_successful_retry'
END
) AS successful_retry_flag
FROM t
如果我假设 200 次是一次成功重试,您可以使用累积最小值获得下一次成功重试。剩下的就是设置标志的日期算法:
select t.*,
(case when min(case when code = 200 then event_ts end) over
(partition by id
order by event_ts
rows between current row and unbounded following
) < event_ts + interval '1' second
then 1 else 0
end) as successful_retry_flag
from t;
您还可以使用更具可读性和可扩展性的 MATCH_RECOGNIZE
solution, which was added recently to Trino(formerly PrestoSQL)。
使用 MATCH_RECOGNIZE
解决方案,您将根据正在扫描的当前行中 code
的值定义标签 success
和 failure
。您还可以使用 MEASURES
子句定义每行之间的时间度量,以根据我们可以定义为 time_to_success
的当前失败时间戳为您提供最后成功行的 LAST
时间戳。使用通过模式匹配定义的这些值,您现在可以使用 CASE
语句过滤它们,就像@Gordon Linoff 的解决方案一样。
trino> WITH t(date, id, event, url, code, event_ts) AS (VALUES
-> (DATE '2021-08-20', 11, 1629515037.0682, 'xyz', 503, TIMESTAMP '2021-08-20 20:03:57.068'),
-> (DATE '2021-08-20', 11, 1629515037.1073, 'xyz', 200, TIMESTAMP '2021-08-20 20:03:57.107'),
-> (DATE '2021-08-20', 12, 1629515037.1866, 'abc', 503, TIMESTAMP '2021-08-20 20:03:57.187'),
-> (DATE '2021-08-20', 12, 1629515037.1942, 'abc', 503, TIMESTAMP '2021-08-20 20:03:57.194'),
-> (DATE '2021-08-20', 12, 1629515037.2037, 'abc', 503, TIMESTAMP '2021-08-20 20:03:57.204'),
-> (DATE '2021-08-20', 12, 1629515037.2249, 'abc', 503, TIMESTAMP '2021-08-20 20:03:57.225'),
-> (DATE '2021-08-20', 12, 1629515064.2427, 'abc', 200, TIMESTAMP '2021-08-20 20:04:24.243')
-> )
-> SELECT date, id, event, url, code, event_ts,
-> CASE
-> WHEN code = 200 THEN 'successful_response'
-> WHEN time_to_success < INTERVAL '1' SECOND THEN 'successful_retry'
-> WHEN time_to_success >= INTERVAL '1' SECOND THEN 'successful_retry_after_1_sec'
-> WHEN time_to_success IS NULL THEN 'no_successful_retry'
-> END flag
-> FROM t
-> MATCH_RECOGNIZE (
-> PARTITION BY id
-> ORDER BY event_ts
-> MEASURES FINAL LAST(success.event_ts) - failure.event_ts AS time_to_success
-> ALL ROWS PER MATCH WITH UNMATCHED ROWS
-> PATTERN (success* failure+ success)
-> DEFINE
-> success AS code = 200,
-> failure AS code = 503
-> );
date | id | event | url | code | event_ts | flag
------------+----+-----------------+-----+------+-------------------------+------------------------------
2021-08-20 | 12 | 1629515037.1866 | abc | 503 | 2021-08-20 20:03:57.187 | successful_retry_after_1_sec
2021-08-20 | 12 | 1629515037.1942 | abc | 503 | 2021-08-20 20:03:57.194 | successful_retry_after_1_sec
2021-08-20 | 12 | 1629515037.2037 | abc | 503 | 2021-08-20 20:03:57.204 | successful_retry_after_1_sec
2021-08-20 | 12 | 1629515037.2249 | abc | 503 | 2021-08-20 20:03:57.225 | successful_retry_after_1_sec
2021-08-20 | 12 | 1629515064.2427 | abc | 200 | 2021-08-20 20:04:24.243 | successful_response
2021-08-20 | 11 | 1629515037.0682 | xyz | 503 | 2021-08-20 20:03:57.068 | successful_retry
2021-08-20 | 11 | 1629515037.1073 | xyz | 200 | 2021-08-20 20:03:57.107 | successful_response
(7 rows)
我正在尝试根据 1 秒内的成功重试为数据中的每个条目分配标志。这是示例数据
date id event url code event_ts
2021-08-20 11 1629515037.0682 xyz 503 2021-08-20 20:03:57.068
2021-08-20 11 1629515037.1073 xyz 200 2021-08-20 20:03:57.107 -- successful retry within 1 sec
2021-08-20 12 1629515037.1866 abc 503 2021-08-20 20:03:57.187
2021-08-20 12 1629515037.1942 abc 503 2021-08-20 20:03:57.194
2021-08-20 12 1629515037.2037 abc 503 2021-08-20 20:03:57.204
2021-08-20 12 1629515037.2249 abc 503 2021-08-20 20:03:57.225
2021-08-20 12 1629515064.2427 abc 200 2021-08-20 20:04:24.243 -- successful retry within 1 sec
我想创建一个新列重试,
if code = 503, successful retry within 1 sec -> successful_retry
if code = 503, successful retry after 1 sec -> successful_retry_after_1_sec
if code = 503, no successful retry at all -> no_successful_retry
我主要是 Python/Pandas 人,但需要立即解决这个问题。我尝试使用 LEAD() 但无法编写具有可变偏移量的解决方案。多谢指点
编辑:基于@Gordon Linoff 的回答
SELECT
date,
id,
url,
event,
FROM_UNIXTIME(event) AS event_ts,
code,
(
CASE
WHEN code <= 399 THEN 'successful_response'
WHEN MIN(CASE WHEN code <= 399 THEN FROM_UNIXTIME(event) END) OVER (
PARTITION BY
date,
id,
url
ORDER BY
date,
id,
url,
event rows BETWEEN CURRENT ROW AND UNBOUNDED following
) <= FROM_UNIXTIME(event) + INTERVAL '1' SECOND THEN 'success_retry_within_1_sec'
WHEN MIN(CASE WHEN code <= 399 THEN FROM_UNIXTIME(event) END) OVER (
PARTITION BY
date,
id,
url
ORDER BY
date,
id,
url,
event rows BETWEEN CURRENT ROW AND UNBOUNDED following
) > FROM_UNIXTIME(event) + INTERVAL '1' SECOND THEN 'success_retry_after_1_sec'
ELSE 'No_successful_retry'
END
) AS successful_retry_flag
FROM t
如果我假设 200 次是一次成功重试,您可以使用累积最小值获得下一次成功重试。剩下的就是设置标志的日期算法:
select t.*,
(case when min(case when code = 200 then event_ts end) over
(partition by id
order by event_ts
rows between current row and unbounded following
) < event_ts + interval '1' second
then 1 else 0
end) as successful_retry_flag
from t;
您还可以使用更具可读性和可扩展性的 MATCH_RECOGNIZE
solution, which was added recently to Trino(formerly PrestoSQL)。
使用 MATCH_RECOGNIZE
解决方案,您将根据正在扫描的当前行中 code
的值定义标签 success
和 failure
。您还可以使用 MEASURES
子句定义每行之间的时间度量,以根据我们可以定义为 time_to_success
的当前失败时间戳为您提供最后成功行的 LAST
时间戳。使用通过模式匹配定义的这些值,您现在可以使用 CASE
语句过滤它们,就像@Gordon Linoff 的解决方案一样。
trino> WITH t(date, id, event, url, code, event_ts) AS (VALUES
-> (DATE '2021-08-20', 11, 1629515037.0682, 'xyz', 503, TIMESTAMP '2021-08-20 20:03:57.068'),
-> (DATE '2021-08-20', 11, 1629515037.1073, 'xyz', 200, TIMESTAMP '2021-08-20 20:03:57.107'),
-> (DATE '2021-08-20', 12, 1629515037.1866, 'abc', 503, TIMESTAMP '2021-08-20 20:03:57.187'),
-> (DATE '2021-08-20', 12, 1629515037.1942, 'abc', 503, TIMESTAMP '2021-08-20 20:03:57.194'),
-> (DATE '2021-08-20', 12, 1629515037.2037, 'abc', 503, TIMESTAMP '2021-08-20 20:03:57.204'),
-> (DATE '2021-08-20', 12, 1629515037.2249, 'abc', 503, TIMESTAMP '2021-08-20 20:03:57.225'),
-> (DATE '2021-08-20', 12, 1629515064.2427, 'abc', 200, TIMESTAMP '2021-08-20 20:04:24.243')
-> )
-> SELECT date, id, event, url, code, event_ts,
-> CASE
-> WHEN code = 200 THEN 'successful_response'
-> WHEN time_to_success < INTERVAL '1' SECOND THEN 'successful_retry'
-> WHEN time_to_success >= INTERVAL '1' SECOND THEN 'successful_retry_after_1_sec'
-> WHEN time_to_success IS NULL THEN 'no_successful_retry'
-> END flag
-> FROM t
-> MATCH_RECOGNIZE (
-> PARTITION BY id
-> ORDER BY event_ts
-> MEASURES FINAL LAST(success.event_ts) - failure.event_ts AS time_to_success
-> ALL ROWS PER MATCH WITH UNMATCHED ROWS
-> PATTERN (success* failure+ success)
-> DEFINE
-> success AS code = 200,
-> failure AS code = 503
-> );
date | id | event | url | code | event_ts | flag
------------+----+-----------------+-----+------+-------------------------+------------------------------
2021-08-20 | 12 | 1629515037.1866 | abc | 503 | 2021-08-20 20:03:57.187 | successful_retry_after_1_sec
2021-08-20 | 12 | 1629515037.1942 | abc | 503 | 2021-08-20 20:03:57.194 | successful_retry_after_1_sec
2021-08-20 | 12 | 1629515037.2037 | abc | 503 | 2021-08-20 20:03:57.204 | successful_retry_after_1_sec
2021-08-20 | 12 | 1629515037.2249 | abc | 503 | 2021-08-20 20:03:57.225 | successful_retry_after_1_sec
2021-08-20 | 12 | 1629515064.2427 | abc | 200 | 2021-08-20 20:04:24.243 | successful_response
2021-08-20 | 11 | 1629515037.0682 | xyz | 503 | 2021-08-20 20:03:57.068 | successful_retry
2021-08-20 | 11 | 1629515037.1073 | xyz | 200 | 2021-08-20 20:03:57.107 | successful_response
(7 rows)