如何使用 wj 将时间戳列表存储在 table 中

Question

我有一个按日期、符号和 client_type 聚合的 table。其余列不是聚合而是列表。

例如：

table2: select requestId, initialTimestamp by date, sym, client_type, from table

requestId 和 initialTimestamp 因此是列表。

我想基本上在5分钟内找到重复的数据并标记为。

如果我的 5 分钟是固定的，那么我可以做类似的事情：

table3: update bucket:  (5 xbar\:`minute$initialTimestamp) from table2;

然后取消分组并使用存储桶识别骗子...例如

table4: select requestId by bucket, sym, client_type ungroup table3;
update duplicateId: ` from (ungroup update duplicateId: ?[(count each requestId)>1; requestId@'1; `] from table4) where requestId=duplicateId

这很好用，但是如果我想要 5 分钟的滚动 window 而不是固定的 window 呢？

它似乎指向一个 wj - 但我不确定如何让它与分组列一起使用。

Answer 1

如果您只想查找重复项，您可以使用 fby 而不是时间 buckets/windows?

select from table where 1 < (count;initialTimestamp) fby ([]date;sym;client_type;requestId)

编辑：我仍然认为 wj 是不必要的，而且效率不高。您可以像这样按组更新：

// given the following dummy table

t:update time:"P"$"D" sv/: flip(string[date];string[time]) from ([]date:raze 20#'.z.d+til 5;sym:100?`symA`symB`symC;time:10:00+til 100;clientType:100?`clientA`clientB;requestId:100?3);

update dup:00:05>=deltas time by date,sym,clientType,requestId from t

编辑：既然你想要第一个副本，那么我认为 wj 可能是你最初想要的唯一方法。您要使用 wj1，因为 wj 会在输入时间 window.

之前考虑主导行

// sort by grouping then time
t2:`date`sym`clientType`time xasc t; 
times:-00:05 -00:00:00.000000001 +\:t2`time;

// params
// pair of time lists
// common columns/grouping with time last
// tableToJoinTo
// (windowJoinTable;(function;col))

wj1[times;`date`sym`clientType`time;t2;(update dupId:requestId from t2;(first;`dupId))]

如何使用 wj 将时间戳列表存储在 table 中

how to use wj to bucket a list of timestamps in a table

kdb