使用数据表分组后包括列
Include column after grouping using datatable
我的目标是按 zip 计算组百分比列。我通过 zip 创建了 % 列,但一直丢失我的组 ('cgrp') 变量。我怎样才能将其保留在我的最终结果中?
我的数据 table 脚本给我以下结果:
zip V1
1: 12007 19.35484
2: 12007 48.38710
3: 12007 32.25806
4: 12008 40.00000
5: 12008 41.66667
6: 12008 18.33333
但我还希望包含 cgrp
列。一直在尝试 .SD
和 SDcols
的不同组合,但无法达到 work.This 是我想要的:
zip V1 cgrp
1: 12007 19.35484 3
2: 12007 48.38710 4
3: 12007 32.25806 1
4: 12008 40.00000 1
5: 12008 41.66667 4
6: 12008 18.33333 3
脚本:
zip.grp <- ninefive[, .(zgrp = .N), by = .(cgrp,zip)
][, 100 *(zgrp/sum(zgrp)), by = zip]
ninefive 数据样本:
zip lower avg upper SSN RISK idk diff avgDiff cgrp
1: 12007 -170.3723 592 1354.372 127 676 1 84 137.2903 3
2: 12007 -170.3723 592 1354.372 064 828 1 236 137.2903 4
3: 12007 -170.3723 592 1354.372 080 627 1 35 137.2903 1
4: 12007 -170.3723 592 1354.372 057 770 1 178 137.2903 4
5: 12007 -170.3723 592 1354.372 014 770 1 178 137.2903 4
6: 12007 -170.3723 592 1354.372 084 893 1 301 137.2903 4
7: 12007 -170.3723 592 1354.372 105 757 1 165 137.2903 4
8: 12007 -170.3723 592 1354.372 093 494 1 98 137.2903 1
9: 12007 -170.3723 592 1354.372 080 744 1 152 137.2903 4
10: 12007 -170.3723 592 1354.372 102 494 1 98 137.2903 1
11: 12007 -170.3723 592 1354.372 062 748 1 156 137.2903 4
12: 12007 -170.3723 592 1354.372 729 711 1 119 137.2903 3
13: 12007 -170.3723 592 1354.372 059 677 1 85 137.2903 3
14: 12007 -170.3723 592 1354.372 090 718 1 126 137.2903 3
15: 12007 -170.3723 592 1354.372 053 636 1 44 137.2903 1
16: 12007 -170.3723 592 1354.372 081 855 1 263 137.2903 4
17: 12007 -170.3723 592 1354.372 073 811 1 219 137.2903 4
18: 12007 -170.3723 592 1354.372 092 614 1 22 137.2903 1
19: 12007 -170.3723 592 1354.372 081 789 1 197 137.2903 4
20: 12007 -170.3723 592 1354.372 105 831 1 239 137.2903 4
21: 12007 -170.3723 592 1354.372 108 809 1 217 137.2903 4
22: 12007 -170.3723 592 1354.372 093 649 1 57 137.2903 1
23: 12007 -170.3723 592 1354.372 128 685 1 93 137.2903 3
24: 12007 -170.3723 592 1354.372 093 574 1 18 137.2903 1
25: 12007 -170.3723 592 1354.372 119 640 1 48 137.2903 1
26: 12007 -170.3723 592 1354.372 163 813 1 221 137.2903 4
27: 12007 -170.3723 592 1354.372 062 678 1 86 137.2903 3
28: 12007 -170.3723 592 1354.372 102 652 1 60 137.2903 1
29: 12007 -170.3723 592 1354.372 379 532 1 60 137.2903 1
30: 12007 -170.3723 592 1354.372 107 803 1 211 137.2903 4
31: 12007 -170.3723 592 1354.372 060 782 1 190 137.2903 4
32: 12008 -262.0840 729 1720.084 110 547 1 182 104.8667 1
33: 12008 -262.0840 729 1720.084 023 821 1 92 104.8667 4
34: 12008 -262.0840 729 1720.084 072 649 1 80 104.8667 1
35: 12008 -262.0840 729 1720.084 119 602 1 127 104.8667 1
36: 12008 -262.0840 729 1720.084 076 553 1 176 104.8667 1
37: 12008 -262.0840 729 1720.084 083 606 1 123 104.8667 1
38: 12008 -262.0840 729 1720.084 124 645 1 84 104.8667 1
39: 12008 -262.0840 729 1720.084 086 700 1 29 104.8667 3
40: 12008 -262.0840 729 1720.084 063 579 1 150 104.8667 1
41: 12008 -262.0840 729 1720.084 086 746 1 17 104.8667 4
42: 12008 -262.0840 729 1720.084 075 732 1 3 104.8667 4
43: 12008 -262.0840 729 1720.084 082 656 1 73 104.8667 1
44: 12008 -262.0840 729 1720.084 057 515 1 214 104.8667 1
45: 12008 -262.0840 729 1720.084 068 806 1 77 104.8667 4
46: 12008 -262.0840 729 1720.084 103 797 1 68 104.8667 4
47: 12008 -262.0840 729 1720.084 110 578 1 151 104.8667 1
48: 12008 -262.0840 729 1720.084 102 709 1 20 104.8667 3
49: 12008 -262.0840 729 1720.084 565 567 1 162 104.8667 1
50: 12008 -262.0840 729 1720.084 037 886 1 157 104.8667 4
您可以使用:=
创建新列
ninefive[, .(zgrp=.N), by= .(cgrp, zip)][, V1:=100*(zgrp/sum(zgrp)),
by=zip][, zgrp:=NULL]
# cgrp zip V1
#1: 3 12007 19.35484
#2: 4 12007 48.38710
#3: 1 12007 32.25806
#4: 1 12008 57.89474
#5: 4 12008 31.57895
#6: 3 12008 10.52632
或者正如@Frank 评论的那样,您可以在 list
中包含 cgrp
列
ninefive[, .(zgrp=.N), by= .(cgrp, zip)][, list(cgrp,V1=100*
(zgrp/sum(zgrp))), by=zip]
# zip cgrp V1
#1: 12007 3 19.35484
#2: 12007 4 48.38710
#3: 12007 1 32.25806
#4: 12008 1 57.89474
#5: 12008 4 31.57895
#6: 12008 3 10.52632
我的目标是按 zip 计算组百分比列。我通过 zip 创建了 % 列,但一直丢失我的组 ('cgrp') 变量。我怎样才能将其保留在我的最终结果中?
我的数据 table 脚本给我以下结果:
zip V1
1: 12007 19.35484
2: 12007 48.38710
3: 12007 32.25806
4: 12008 40.00000
5: 12008 41.66667
6: 12008 18.33333
但我还希望包含 cgrp
列。一直在尝试 .SD
和 SDcols
的不同组合,但无法达到 work.This 是我想要的:
zip V1 cgrp
1: 12007 19.35484 3
2: 12007 48.38710 4
3: 12007 32.25806 1
4: 12008 40.00000 1
5: 12008 41.66667 4
6: 12008 18.33333 3
脚本:
zip.grp <- ninefive[, .(zgrp = .N), by = .(cgrp,zip)
][, 100 *(zgrp/sum(zgrp)), by = zip]
ninefive 数据样本:
zip lower avg upper SSN RISK idk diff avgDiff cgrp
1: 12007 -170.3723 592 1354.372 127 676 1 84 137.2903 3
2: 12007 -170.3723 592 1354.372 064 828 1 236 137.2903 4
3: 12007 -170.3723 592 1354.372 080 627 1 35 137.2903 1
4: 12007 -170.3723 592 1354.372 057 770 1 178 137.2903 4
5: 12007 -170.3723 592 1354.372 014 770 1 178 137.2903 4
6: 12007 -170.3723 592 1354.372 084 893 1 301 137.2903 4
7: 12007 -170.3723 592 1354.372 105 757 1 165 137.2903 4
8: 12007 -170.3723 592 1354.372 093 494 1 98 137.2903 1
9: 12007 -170.3723 592 1354.372 080 744 1 152 137.2903 4
10: 12007 -170.3723 592 1354.372 102 494 1 98 137.2903 1
11: 12007 -170.3723 592 1354.372 062 748 1 156 137.2903 4
12: 12007 -170.3723 592 1354.372 729 711 1 119 137.2903 3
13: 12007 -170.3723 592 1354.372 059 677 1 85 137.2903 3
14: 12007 -170.3723 592 1354.372 090 718 1 126 137.2903 3
15: 12007 -170.3723 592 1354.372 053 636 1 44 137.2903 1
16: 12007 -170.3723 592 1354.372 081 855 1 263 137.2903 4
17: 12007 -170.3723 592 1354.372 073 811 1 219 137.2903 4
18: 12007 -170.3723 592 1354.372 092 614 1 22 137.2903 1
19: 12007 -170.3723 592 1354.372 081 789 1 197 137.2903 4
20: 12007 -170.3723 592 1354.372 105 831 1 239 137.2903 4
21: 12007 -170.3723 592 1354.372 108 809 1 217 137.2903 4
22: 12007 -170.3723 592 1354.372 093 649 1 57 137.2903 1
23: 12007 -170.3723 592 1354.372 128 685 1 93 137.2903 3
24: 12007 -170.3723 592 1354.372 093 574 1 18 137.2903 1
25: 12007 -170.3723 592 1354.372 119 640 1 48 137.2903 1
26: 12007 -170.3723 592 1354.372 163 813 1 221 137.2903 4
27: 12007 -170.3723 592 1354.372 062 678 1 86 137.2903 3
28: 12007 -170.3723 592 1354.372 102 652 1 60 137.2903 1
29: 12007 -170.3723 592 1354.372 379 532 1 60 137.2903 1
30: 12007 -170.3723 592 1354.372 107 803 1 211 137.2903 4
31: 12007 -170.3723 592 1354.372 060 782 1 190 137.2903 4
32: 12008 -262.0840 729 1720.084 110 547 1 182 104.8667 1
33: 12008 -262.0840 729 1720.084 023 821 1 92 104.8667 4
34: 12008 -262.0840 729 1720.084 072 649 1 80 104.8667 1
35: 12008 -262.0840 729 1720.084 119 602 1 127 104.8667 1
36: 12008 -262.0840 729 1720.084 076 553 1 176 104.8667 1
37: 12008 -262.0840 729 1720.084 083 606 1 123 104.8667 1
38: 12008 -262.0840 729 1720.084 124 645 1 84 104.8667 1
39: 12008 -262.0840 729 1720.084 086 700 1 29 104.8667 3
40: 12008 -262.0840 729 1720.084 063 579 1 150 104.8667 1
41: 12008 -262.0840 729 1720.084 086 746 1 17 104.8667 4
42: 12008 -262.0840 729 1720.084 075 732 1 3 104.8667 4
43: 12008 -262.0840 729 1720.084 082 656 1 73 104.8667 1
44: 12008 -262.0840 729 1720.084 057 515 1 214 104.8667 1
45: 12008 -262.0840 729 1720.084 068 806 1 77 104.8667 4
46: 12008 -262.0840 729 1720.084 103 797 1 68 104.8667 4
47: 12008 -262.0840 729 1720.084 110 578 1 151 104.8667 1
48: 12008 -262.0840 729 1720.084 102 709 1 20 104.8667 3
49: 12008 -262.0840 729 1720.084 565 567 1 162 104.8667 1
50: 12008 -262.0840 729 1720.084 037 886 1 157 104.8667 4
您可以使用:=
创建新列
ninefive[, .(zgrp=.N), by= .(cgrp, zip)][, V1:=100*(zgrp/sum(zgrp)),
by=zip][, zgrp:=NULL]
# cgrp zip V1
#1: 3 12007 19.35484
#2: 4 12007 48.38710
#3: 1 12007 32.25806
#4: 1 12008 57.89474
#5: 4 12008 31.57895
#6: 3 12008 10.52632
或者正如@Frank 评论的那样,您可以在 list
cgrp
列
ninefive[, .(zgrp=.N), by= .(cgrp, zip)][, list(cgrp,V1=100*
(zgrp/sum(zgrp))), by=zip]
# zip cgrp V1
#1: 12007 3 19.35484
#2: 12007 4 48.38710
#3: 12007 1 32.25806
#4: 12008 1 57.89474
#5: 12008 4 31.57895
#6: 12008 3 10.52632