在 kdb+/q 中旋转 table
Pivot table in kdb+/q
我正在尝试在 KDB/q 中调整一些贸易数据。虽然我的数据与网站上的工作示例仅略有不同(请参阅通用数据透视函数:http://code.kx.com/q/cookbook/pivoting-tables/),
我无法使该功能正常工作,即使经过几个小时的尝试(我对 KDB 还很陌生)。
简而言之,我正在尝试从 table:
q)5# trades_agg
date sym time exchange buysell| shares
--------------------------------------| ------
2009.01.05 aaca 09:30 BATS B | 484
2009.01.05 aaca 09:30 BATS S | 434
2009.01.05 aaca 09:30 NASDAQ B | 235
2009.01.05 aaca 09:30 NASDAQ S | 429
2009.01.05 aaca 09:30 NYSE B | 309
给这个:
date sym time | BATSsharesB BATSsharesS NASDAQsharesB ...
----------------------| -----------------------------------------------
2009.01.05 aaca 09:30 | 484 434 235 ...
... | ...
我将提供一个工作示例来说明事情:
// Create data
qpd:5*2*4*"i":00-09:30
date:raze(100*qpd)#'2009.01.05+til 5
sym:(raze/)5#enlist qpd#'100?`4
sym:(neg count sym)?sym
time:"t"$raze 500#enlist 09:30:00+15*til qpd
time+:(count time)?1000
exchange:raze 500#enlist raze(qpd div 3)#enlist`NYSE`NASDAQ`BATS
buysell:raze 500#enlist raze(qpd div 2)#enlist`B`S
shares:(500*qpd)?100
trades:([]date;sym;time;exchange;buysell;shares)
//I then aggregate the data into equal sized buckets
trades_agg: select sum shares by date, sym, time: 15 xbar time.minute, exchange, buysell from trades
// pivot function from the code.kx.com website
piv:{[t;k;p;v;f;g]
v:(),v;
G:group flip k!(t:.Q.v t)k;
F:group flip p!t p;
count[k]!g[k;P;C]xcols 0!key[G]!flip(C:f[v]P:flip value flip key F)!raze
{[i;j;k;x;y]
a:count[x]#x 0N;
a[y]:x y;
b:count[x]#0b;
b[y]:1b;
c:a i;
c[k]:first'[a[j]@'where'[b j]];
c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]}
我随后将此枢轴函数应用于示例,并将函数 f 和 g 设置为其默认 (::) 值,但我收到一条错误消息:
piv[`trades_agg;`date`sym`time;`exchange`buysell;`shares;(::);(::)]
即使我使用建议的 f 和 g 函数也不起作用:
f:{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]}
g:{[k;P;c]k,(raze/)flip flip each 5 cut'10 cut raze reverse 10 cut asc c}
我不明白为什么这不能正常工作,因为它与网站上的示例非常接近。
您的 table 已加密,因此取消加密:
trades_agg:0!select sum shares by date, sym, time: 15 xbar time.minute,exchange,buysell from trades
并将你的 g 定义为:
g:{[k;P;c]k,c}
弄清楚 f/g 需要什么的最好方法是用断点定义它,然后研究变量
g:{[k;P;c]break}
这是一个更易于使用的独立版本:
tt:1000#0!trades_agg
piv:{[t;k;p;v]
/ controls new columns names
f:{[v;P]`${raze " " sv x} each string raze P[;0],'/:v,/:\:P[;1]};
v:(),v; k:(),k; p:(),p; / make sure args are lists
G:group flip k!(t:.Q.v t)k;
F:group flip p!t p;
key[G]!flip(C:f[v]P:flip value flip key F)!raze
{[i;j;k;x;y]
a:count[x]#x 0N;
a[y]:x y;
b:count[x]#0b;
b[y]:1b;
c:a i;
c[k]:first'[a[j]@'where'[b j]];
c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]};
q)piv[`tt;`date`sym`time;`exchange`buysell;enlist `shares]
date sym time | BATS shares B BATS shares S NASDAQ shares B NASDAQ sha..
---------------------| ------------------------------------------------------..
2009.01.05 adkk 09:30| 577 359 499 452 ..
2009.01.05 adkk 09:45| 882 501 339 467 ..
2009.01.05 adkk 10:00| 620 513 411 128 ..
2009.01.05 adkk 10:15| 501 544 272 544 ..
2009.01.05 adkk 10:30| 291 594 363 331 ..
2009.01.05 adkk 10:45| 867 500 498 536 ..
2009.01.05 adkk 11:00| 624 632 694 493 ..
2009.01.05 adkk 11:15| 99 704 600 299 ..
2009.01.05 adkk 11:30| 269 394 280 392 ..
2009.01.05 adkk 11:45| 635 744 758 597 ..
2009.01.05 adkk 12:00| 562 354 498 405 ..
2009.01.05 adkk 12:15| 416 437 303 492 ..
2009.01.05 adkk 12:30| 447 699 370 302 ..
2009.01.05 adkk 12:45| 336 647 512 245 ..
2009.01.05 adkk 13:00| 692 457 497 553 ..
我发现 Ryan 的回答中原来的 piv
函数很难理解,所以我通过添加一些注释 + 更易读的变量名 HTH
对其进行了更新
piv:{[table; rows; columns; vals]
/ make sure args are lists
vals: (),vals;
rows: (),rows;
columns: (),columns;
/ Get columns of table corresponding to those of row labels and calculate groups
/ group returns filteredValues dict whose keys are the unique row labels and vals are the row indices of each group e.g. (0 1 3; 2 4; ...)
rowGroups: group rows#table;
rowGroupIdxs: value rowGroups;
rowValues: key[rowGroups];
/ Similarly, get columns of table corresponding to those of column labels and calculate groups
colGroups: group columns#table;
colGroupIdxs: value colGroups;
colValues: key colGroups;
getPivotCol: {[rowGroupStartIdx; nonSingleRowGroups; nonSingleRowGroupsIdx; vals; colGroupIdxs]
/ vals: the list of values for this particular value-column combination
/ colGroupIdxs: the list of indices for this particular column group
/ We only care about vals that should belong in this pivot column - we need to filter out vals not part of this column group
filteredValues: count[vals]#vals[0N];
filteredValues[colGroupIdxs]: vals[colGroupIdxs];
/ Equivalent to filteredValues <> 0N
hasValue: count[vals]#0b;
hasValue[colGroupIdxs]: 1b;
/ Seed off pivot column with the first (filtered) value of each row group
/ This will be correct for row groups of size 1 as no aggregation needs to occur
pivotCol: filteredValues[rowGroupStartIdx];
/ Otherwise, for the row groups larger than 1, get the first (filtered) value
pivotCol[nonSingleRowGroupsIdx]: first'[filteredValues[nonSingleRowGroups]@'where'[hasValue[nonSingleRowGroups]]];
pivotCol
}
/ Groups with more than 1 row (these are the ones that will need aggregating)
nonSingleRowGroupsIdx: where 1 <> count'[rowGroupIdxs];
/ Get resulting pivot column for each combination of column and value fields
pivotCols: raze getPivotCol[rowGroupIdxs[;0]; rowGroupIdxs[nonSingleRowGroupsIdx]; nonSingleRowGroupsIdx] /:\: [table[vals]; colGroupIdxs]
/ Columns names are the cross-product of column and value fields
colNames:`${raze "" sv vals} each string raze (flip value flip colValues),'/:vals;
/ Finally, stitch together row and column headings with pivot data to obtain final table
rowValues!flip colNames!pivotCols
};
顺便说一下,我还根据需要对列名称的格式做了一些小改动
我正在尝试在 KDB/q 中调整一些贸易数据。虽然我的数据与网站上的工作示例仅略有不同(请参阅通用数据透视函数:http://code.kx.com/q/cookbook/pivoting-tables/), 我无法使该功能正常工作,即使经过几个小时的尝试(我对 KDB 还很陌生)。
简而言之,我正在尝试从 table:
q)5# trades_agg
date sym time exchange buysell| shares
--------------------------------------| ------
2009.01.05 aaca 09:30 BATS B | 484
2009.01.05 aaca 09:30 BATS S | 434
2009.01.05 aaca 09:30 NASDAQ B | 235
2009.01.05 aaca 09:30 NASDAQ S | 429
2009.01.05 aaca 09:30 NYSE B | 309
给这个:
date sym time | BATSsharesB BATSsharesS NASDAQsharesB ...
----------------------| -----------------------------------------------
2009.01.05 aaca 09:30 | 484 434 235 ...
... | ...
我将提供一个工作示例来说明事情:
// Create data
qpd:5*2*4*"i":00-09:30
date:raze(100*qpd)#'2009.01.05+til 5
sym:(raze/)5#enlist qpd#'100?`4
sym:(neg count sym)?sym
time:"t"$raze 500#enlist 09:30:00+15*til qpd
time+:(count time)?1000
exchange:raze 500#enlist raze(qpd div 3)#enlist`NYSE`NASDAQ`BATS
buysell:raze 500#enlist raze(qpd div 2)#enlist`B`S
shares:(500*qpd)?100
trades:([]date;sym;time;exchange;buysell;shares)
//I then aggregate the data into equal sized buckets
trades_agg: select sum shares by date, sym, time: 15 xbar time.minute, exchange, buysell from trades
// pivot function from the code.kx.com website
piv:{[t;k;p;v;f;g]
v:(),v;
G:group flip k!(t:.Q.v t)k;
F:group flip p!t p;
count[k]!g[k;P;C]xcols 0!key[G]!flip(C:f[v]P:flip value flip key F)!raze
{[i;j;k;x;y]
a:count[x]#x 0N;
a[y]:x y;
b:count[x]#0b;
b[y]:1b;
c:a i;
c[k]:first'[a[j]@'where'[b j]];
c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]}
我随后将此枢轴函数应用于示例,并将函数 f 和 g 设置为其默认 (::) 值,但我收到一条错误消息:
piv[`trades_agg;`date`sym`time;`exchange`buysell;`shares;(::);(::)]
即使我使用建议的 f 和 g 函数也不起作用:
f:{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]}
g:{[k;P;c]k,(raze/)flip flip each 5 cut'10 cut raze reverse 10 cut asc c}
我不明白为什么这不能正常工作,因为它与网站上的示例非常接近。
您的 table 已加密,因此取消加密:
trades_agg:0!select sum shares by date, sym, time: 15 xbar time.minute,exchange,buysell from trades
并将你的 g 定义为:
g:{[k;P;c]k,c}
弄清楚 f/g 需要什么的最好方法是用断点定义它,然后研究变量
g:{[k;P;c]break}
这是一个更易于使用的独立版本:
tt:1000#0!trades_agg
piv:{[t;k;p;v]
/ controls new columns names
f:{[v;P]`${raze " " sv x} each string raze P[;0],'/:v,/:\:P[;1]};
v:(),v; k:(),k; p:(),p; / make sure args are lists
G:group flip k!(t:.Q.v t)k;
F:group flip p!t p;
key[G]!flip(C:f[v]P:flip value flip key F)!raze
{[i;j;k;x;y]
a:count[x]#x 0N;
a[y]:x y;
b:count[x]#0b;
b[y]:1b;
c:a i;
c[k]:first'[a[j]@'where'[b j]];
c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]};
q)piv[`tt;`date`sym`time;`exchange`buysell;enlist `shares]
date sym time | BATS shares B BATS shares S NASDAQ shares B NASDAQ sha..
---------------------| ------------------------------------------------------..
2009.01.05 adkk 09:30| 577 359 499 452 ..
2009.01.05 adkk 09:45| 882 501 339 467 ..
2009.01.05 adkk 10:00| 620 513 411 128 ..
2009.01.05 adkk 10:15| 501 544 272 544 ..
2009.01.05 adkk 10:30| 291 594 363 331 ..
2009.01.05 adkk 10:45| 867 500 498 536 ..
2009.01.05 adkk 11:00| 624 632 694 493 ..
2009.01.05 adkk 11:15| 99 704 600 299 ..
2009.01.05 adkk 11:30| 269 394 280 392 ..
2009.01.05 adkk 11:45| 635 744 758 597 ..
2009.01.05 adkk 12:00| 562 354 498 405 ..
2009.01.05 adkk 12:15| 416 437 303 492 ..
2009.01.05 adkk 12:30| 447 699 370 302 ..
2009.01.05 adkk 12:45| 336 647 512 245 ..
2009.01.05 adkk 13:00| 692 457 497 553 ..
我发现 Ryan 的回答中原来的 piv
函数很难理解,所以我通过添加一些注释 + 更易读的变量名 HTH
piv:{[table; rows; columns; vals]
/ make sure args are lists
vals: (),vals;
rows: (),rows;
columns: (),columns;
/ Get columns of table corresponding to those of row labels and calculate groups
/ group returns filteredValues dict whose keys are the unique row labels and vals are the row indices of each group e.g. (0 1 3; 2 4; ...)
rowGroups: group rows#table;
rowGroupIdxs: value rowGroups;
rowValues: key[rowGroups];
/ Similarly, get columns of table corresponding to those of column labels and calculate groups
colGroups: group columns#table;
colGroupIdxs: value colGroups;
colValues: key colGroups;
getPivotCol: {[rowGroupStartIdx; nonSingleRowGroups; nonSingleRowGroupsIdx; vals; colGroupIdxs]
/ vals: the list of values for this particular value-column combination
/ colGroupIdxs: the list of indices for this particular column group
/ We only care about vals that should belong in this pivot column - we need to filter out vals not part of this column group
filteredValues: count[vals]#vals[0N];
filteredValues[colGroupIdxs]: vals[colGroupIdxs];
/ Equivalent to filteredValues <> 0N
hasValue: count[vals]#0b;
hasValue[colGroupIdxs]: 1b;
/ Seed off pivot column with the first (filtered) value of each row group
/ This will be correct for row groups of size 1 as no aggregation needs to occur
pivotCol: filteredValues[rowGroupStartIdx];
/ Otherwise, for the row groups larger than 1, get the first (filtered) value
pivotCol[nonSingleRowGroupsIdx]: first'[filteredValues[nonSingleRowGroups]@'where'[hasValue[nonSingleRowGroups]]];
pivotCol
}
/ Groups with more than 1 row (these are the ones that will need aggregating)
nonSingleRowGroupsIdx: where 1 <> count'[rowGroupIdxs];
/ Get resulting pivot column for each combination of column and value fields
pivotCols: raze getPivotCol[rowGroupIdxs[;0]; rowGroupIdxs[nonSingleRowGroupsIdx]; nonSingleRowGroupsIdx] /:\: [table[vals]; colGroupIdxs]
/ Columns names are the cross-product of column and value fields
colNames:`${raze "" sv vals} each string raze (flip value flip colValues),'/:vals;
/ Finally, stitch together row and column headings with pivot data to obtain final table
rowValues!flip colNames!pivotCols
};
顺便说一下,我还根据需要对列名称的格式做了一些小改动