pandas groupby 未按预期工作

pandas groupby not working as expected

我有一个数据框:

    >>> d6
Out[57]: 
            Date      sym   Last    M1      M2         dist           code
52735 2017-11-23       C    0.10   4.72   -9.27       677.93  4250 - 12/15/2017
52736 2017-11-23       P  684.20   1.43 -106.09       677.93  4250 - 12/15/2017
53144 2017-11-23       C    0.10   4.49   -9.37       727.93  4300 - 12/15/2017
53145 2017-11-23       P  734.20   0.69 -105.02       727.93  4300 - 12/15/2017
52738 2017-11-23       P  784.20    nan     nan       777.93  4350 - 12/15/2017
52737 2017-11-23       C    0.10   4.29   -9.46       777.93  4350 - 12/15/2017
53081 2017-11-23       P  834.20    nan     nan       827.93  4400 - 12/15/2017
53019 2017-11-23       C    0.10   4.12   -9.55       827.93  4400 - 12/15/2017
52747 2017-11-23       C    0.10   3.96   -9.64       877.93  4450 - 12/15/2017
52748 2017-11-23       P  884.20    nan     nan       877.93  4450 - 12/15/2017
52605 2017-11-23       C    0.10   3.81   -9.71       927.93  4500 - 12/15/2017
52606 2017-11-23       P  934.20    nan     nan       927.93  4500 - 12/15/2017
52753 2017-11-23       C    0.10   3.68   -9.79       977.93  4550 - 12/15/2017
52754 2017-11-23       P  984.30   2.04 -109.96       977.93  4550 - 12/15/2017
53020 2017-11-23       C    0.10   3.56   -9.86      1027.93  4600 - 12/15/2017
53082 2017-11-23       P 1034.30   1.55 -108.99      1027.93  4600 - 12/15/2017
54698 2017-11-23       P 1134.30   0.53 -106.79      1127.93  4700 - 12/15/2017
54687 2017-11-23       C    0.10   3.35   -9.99      1127.93  4700 - 12/15/2017
52337 2017-11-23       C    0.10   3.17  -10.11      1227.93  4800 - 12/15/2017
52338 2017-11-23       P 1234.30    nan     nan      1227.93  4800 - 12/15/2017
54699 2017-11-23       P 1334.30    nan     nan      1327.93  4900 - 12/15/2017
54688 2017-11-23       C    0.10   3.01  -10.22      1327.93  4900 - 12/15/2017
52191 2017-11-23       P    0.10   0.55  -11.15     -3072.07   500 - 12/15/2017
52190 2017-11-23       C 3066.80   0.29   82.60     -3072.07   500 - 12/15/2017
52339 2017-11-23       C    0.10   2.87  -10.32      1427.93  5000 - 12/15/2017
52340 2017-11-23       P 1434.40   1.26 -110.86      1427.93  5000 - 12/15/2017
54689 2017-11-23       C    0.10   2.75  -10.41      1527.93  5100 - 12/15/2017
54700 2017-11-23       P 1534.40   0.45 -108.55      1527.93  5100 - 12/15/2017
52341 2017-11-23       C    0.10   2.65  -10.50      1627.93  5200 - 12/15/2017
52342 2017-11-23       P 1634.40    nan     nan      1627.93  5200 - 12/15/2017
52439 2017-11-23       C    0.10   2.55  -10.58      1727.93  5300 - 12/15/2017
52440 2017-11-23       P 1734.50   1.72 -114.79      1727.93  5300 - 12/15/2017
52343 2017-11-23       C    0.10   2.46  -10.66      1827.93  5400 - 12/15/2017
52344 2017-11-23       P 1834.50   1.08 -112.69      1827.93  5400 - 12/15/2017
54701 2017-11-23       P 1934.50   0.40 -110.30      1927.93  5500 - 12/15/2017
54690 2017-11-23       C    0.10   2.38  -10.73      1927.93  5500 - 12/15/2017
52346 2017-11-23       P 2034.50    nan     nan      2027.93  5600 - 12/15/2017
52345 2017-11-23       C    0.10   2.31  -10.80      2027.93  5600 - 12/15/2017
54691 2017-11-23       C    0.10   2.24  -10.87      2127.93  5700 - 12/15/2017
54702 2017-11-23       P 2134.60   1.52 -116.68      2127.93  5700 - 12/15/2017
52348 2017-11-23       P 2234.60   0.97 -114.51      2227.93  5800 - 12/15/2017
52347 2017-11-23       C    0.10   2.18  -10.93      2227.93  5800 - 12/15/2017
54703 2017-11-23       P 2334.60   0.37 -112.06      2327.93  5900 - 12/15/2017
54692 2017-11-23       C    0.10   2.13  -10.99      2327.93  5900 - 12/15/2017
52192 2017-11-23       C 2966.80   0.46   80.38     -2972.07   600 - 12/15/2017
52193 2017-11-23       P    0.10   0.61  -11.16     -2972.07   600 - 12/15/2017
52349 2017-11-23       C    0.10   2.08  -11.05      2427.93  6000 - 12/15/2017
52350 2017-11-23       P 2434.60    nan     nan      2427.93  6000 - 12/15/2017
52194 2017-11-23       C 2866.70    nan     nan     -2872.07   700 - 12/15/2017
52195 2017-11-23       P    0.10   0.67  -11.16     -2872.07   700 - 12/15/2017
54449 2017-11-23       C    0.10   1.71  -11.52      3427.93  7000 - 12/15/2017
54479 2017-11-23       P 3434.90   0.77 -119.84      3427.93  7000 - 12/15/2017
57740 2017-11-24       C  787.75    nan     nan      -781.23  2800 - 11/24/2017
57742 2017-11-24       P    0.01    nan     nan      -781.23  2800 - 11/24/2017
57741 2017-11-24       C  737.75    nan     nan      -731.23  2850 - 11/24/2017
57743 2017-11-24       P    0.01    nan     nan      -731.23  2850 - 11/24/2017
57730 2017-11-24       C  687.75    nan     nan      -681.23  2900 - 11/24/2017
57735 2017-11-24       P    0.01    nan     nan      -681.23  2900 - 11/24/2017
57731 2017-11-24       C  637.75    nan     nan      -631.23  2950 - 11/24/2017
57736 2017-11-24       P    0.01    nan     nan      -631.23  2950 - 11/24/2017
57732 2017-11-24       C  587.75    nan     nan      -581.23  3000 - 11/24/2017
57737 2017-11-24       P    0.01    nan     nan      -581.23  3000 - 11/24/2017
57733 2017-11-24       C  537.75    nan     nan      -531.23  3050 - 11/24/2017
57738 2017-11-24       P    0.01    nan     nan      -531.23  3050 - 11/24/2017
57727 2017-11-24       P    0.20   7.77  -25.05      -431.23  3150 - 12/08/2017
57728 2017-11-24       P    0.30  11.49  -34.45      -381.23  3200 - 12/08/2017
57734 2017-11-24       C  362.75    nan     nan      -356.23  3225 - 11/24/2017
57739 2017-11-24       P    0.01    nan     nan      -356.23  3225 - 11/24/2017
57729 2017-11-24       P    0.40  14.84  -43.17      -356.23  3225 - 12/08/2017
57826 2017-11-24       C  234.50 140.14 -124.53      -231.23  3350 - 12/22/2017
57845 2017-11-24       P    5.70 140.19 -156.23      -231.23  3350 - 12/22/2017
57827 2017-11-24       C  210.50 160.38 -138.61      -206.23  3375 - 12/22/2017
57846 2017-11-24       P    6.70 160.34 -170.27      -206.23  3375 - 12/22/2017

虽然我上面只显示了2个日期,但它有很多日期。每个日期都有几个 "codes" 的条目。给定日期的每个代码都有 2 个条目 - 一个用于符号 C,一个用于 P。如果我有 M1/M2 个 C 或 P 条目,我想用那个 code/day。如果对于给定的代码+天,C 和 P 都是 nan,我将其保留为 nan。

我目前是这样操作的:

for code in d1.code:
        x_df = d1[d1.code == code]
        x_df = x_df.groupby(['Date'], as_index=False).ffill().bfill()
        d1[d1.code == code] = x_df

这可行,但需要很长时间。这是上面 df 的输出:

Out[62]: 
            Date      sym   Last    M1      M2         dist           code
52735 2017-11-23       C    0.10   4.72   -9.27       677.93  4250 - 12/15/2017
52736 2017-11-23       P  684.20   1.43 -106.09       677.93  4250 - 12/15/2017
53144 2017-11-23       C    0.10   4.49   -9.37       727.93  4300 - 12/15/2017
53145 2017-11-23       P  734.20   0.69 -105.02       727.93  4300 - 12/15/2017
52738 2017-11-23       P  784.20   4.29   -9.46       777.93  4350 - 12/15/2017
52737 2017-11-23       C    0.10   4.29   -9.46       777.93  4350 - 12/15/2017
53081 2017-11-23       P  834.20   4.12   -9.55       827.93  4400 - 12/15/2017
53019 2017-11-23       C    0.10   4.12   -9.55       827.93  4400 - 12/15/2017
52747 2017-11-23       C    0.10   3.96   -9.64       877.93  4450 - 12/15/2017
52748 2017-11-23       P  884.20   3.96   -9.64       877.93  4450 - 12/15/2017
52605 2017-11-23       C    0.10   3.81   -9.71       927.93  4500 - 12/15/2017
52606 2017-11-23       P  934.20   3.81   -9.71       927.93  4500 - 12/15/2017
52753 2017-11-23       C    0.10   3.68   -9.79       977.93  4550 - 12/15/2017
52754 2017-11-23       P  984.30   2.04 -109.96       977.93  4550 - 12/15/2017
53020 2017-11-23       C    0.10   3.56   -9.86      1027.93  4600 - 12/15/2017
53082 2017-11-23       P 1034.30   1.55 -108.99      1027.93  4600 - 12/15/2017
54698 2017-11-23       P 1134.30   0.53 -106.79      1127.93  4700 - 12/15/2017
54687 2017-11-23       C    0.10   3.35   -9.99      1127.93  4700 - 12/15/2017
52337 2017-11-23       C    0.10   3.17  -10.11      1227.93  4800 - 12/15/2017
52338 2017-11-23       P 1234.30   3.17  -10.11      1227.93  4800 - 12/15/2017
54699 2017-11-23       P 1334.30   3.01  -10.22      1327.93  4900 - 12/15/2017
54688 2017-11-23       C    0.10   3.01  -10.22      1327.93  4900 - 12/15/2017
52191 2017-11-23       P    0.10   0.55  -11.15     -3072.07   500 - 12/15/2017
52190 2017-11-23       C 3066.80   0.29   82.60     -3072.07   500 - 12/15/2017
52339 2017-11-23       C    0.10   2.87  -10.32      1427.93  5000 - 12/15/2017
52340 2017-11-23       P 1434.40   1.26 -110.86      1427.93  5000 - 12/15/2017
54689 2017-11-23       C    0.10   2.75  -10.41      1527.93  5100 - 12/15/2017
54700 2017-11-23       P 1534.40   0.45 -108.55      1527.93  5100 - 12/15/2017
52341 2017-11-23       C    0.10   2.65  -10.50      1627.93  5200 - 12/15/2017
52342 2017-11-23       P 1634.40   2.65  -10.50      1627.93  5200 - 12/15/2017
52439 2017-11-23       C    0.10   2.55  -10.58      1727.93  5300 - 12/15/2017
52440 2017-11-23       P 1734.50   1.72 -114.79      1727.93  5300 - 12/15/2017
52343 2017-11-23       C    0.10   2.46  -10.66      1827.93  5400 - 12/15/2017
52344 2017-11-23       P 1834.50   1.08 -112.69      1827.93  5400 - 12/15/2017
54701 2017-11-23       P 1934.50   0.40 -110.30      1927.93  5500 - 12/15/2017
54690 2017-11-23       C    0.10   2.38  -10.73      1927.93  5500 - 12/15/2017
52346 2017-11-23       P 2034.50   2.31  -10.80      2027.93  5600 - 12/15/2017
52345 2017-11-23       C    0.10   2.31  -10.80      2027.93  5600 - 12/15/2017
54691 2017-11-23       C    0.10   2.24  -10.87      2127.93  5700 - 12/15/2017
54702 2017-11-23       P 2134.60   1.52 -116.68      2127.93  5700 - 12/15/2017
52348 2017-11-23       P 2234.60   0.97 -114.51      2227.93  5800 - 12/15/2017
52347 2017-11-23       C    0.10   2.18  -10.93      2227.93  5800 - 12/15/2017
54703 2017-11-23       P 2334.60   0.37 -112.06      2327.93  5900 - 12/15/2017
54692 2017-11-23       C    0.10   2.13  -10.99      2327.93  5900 - 12/15/2017
52192 2017-11-23       C 2966.80   0.46   80.38     -2972.07   600 - 12/15/2017
52193 2017-11-23       P    0.10   0.61  -11.16     -2972.07   600 - 12/15/2017
52349 2017-11-23       C    0.10   2.08  -11.05      2427.93  6000 - 12/15/2017
52350 2017-11-23       P 2434.60   2.08  -11.05      2427.93  6000 - 12/15/2017
52194 2017-11-23       C 2866.70   0.67  -11.16     -2872.07   700 - 12/15/2017
52195 2017-11-23       P    0.10   0.67  -11.16     -2872.07   700 - 12/15/2017
54449 2017-11-23       C    0.10   1.71  -11.52      3427.93  7000 - 12/15/2017
54479 2017-11-23       P 3434.90   0.77 -119.84      3427.93  7000 - 12/15/2017
57740 2017-11-24       C  787.75    nan     nan      -781.23  2800 - 11/24/2017
57742 2017-11-24       P    0.01    nan     nan      -781.23  2800 - 11/24/2017
57741 2017-11-24       C  737.75    nan     nan      -731.23  2850 - 11/24/2017
57743 2017-11-24       P    0.01    nan     nan      -731.23  2850 - 11/24/2017
57730 2017-11-24       C  687.75    nan     nan      -681.23  2900 - 11/24/2017
57735 2017-11-24       P    0.01    nan     nan      -681.23  2900 - 11/24/2017
57731 2017-11-24       C  637.75    nan     nan      -631.23  2950 - 11/24/2017
57736 2017-11-24       P    0.01    nan     nan      -631.23  2950 - 11/24/2017
57732 2017-11-24       C  587.75    nan     nan      -581.23  3000 - 11/24/2017
57737 2017-11-24       P    0.01    nan     nan      -581.23  3000 - 11/24/2017
57733 2017-11-24       C  537.75    nan     nan      -531.23  3050 - 11/24/2017
57738 2017-11-24       P    0.01    nan     nan      -531.23  3050 - 11/24/2017
57727 2017-11-24       P    0.20   7.77  -25.05      -431.23  3150 - 12/08/2017
57728 2017-11-24       P    0.30  11.49  -34.45      -381.23  3200 - 12/08/2017
57734 2017-11-24       C  362.75    nan     nan      -356.23  3225 - 11/24/2017
57739 2017-11-24       P    0.01    nan     nan      -356.23  3225 - 11/24/2017
57729 2017-11-24       P    0.40  14.84  -43.17      -356.23  3225 - 12/08/2017
57826 2017-11-24       C  234.50 140.14 -124.53      -231.23  3350 - 12/22/2017
57845 2017-11-24       P    5.70 140.19 -156.23      -231.23  3350 - 12/22/2017
57827 2017-11-24       C  210.50 160.38 -138.61      -206.23  3375 - 12/22/2017
57846 2017-11-24       P    6.70 160.34 -170.27      -206.23  3375 - 12/22/2017
57828 2017-11-24       C  186.80 184.35 -154.72      -181.23  3400 - 12/22/2017
57847 2017-11-24       P    8.10 185.20 -187.99      -181.23  3400 - 12/22/2017
57829 2017-11-24       C  163.60 213.17 -174.17      -156.23  3425 - 12/22/2017
57848 2017-11-24       P    9.80 213.01 -205.82      -156.23  3425 - 12/22/2017

为了让它更快,我尝试了以下方法:

new_d1= d1.groupby(['code','Date'], as_index=False).ffill().bfill()

这没有按预期工作(因为上面的代码有效)。看起来好像我们只按日期分组而不是 "code"。这是输出:

>>> new_d1
Out[59]: 
            Date      sym   Last    M1      M2         dist           code
52735 2017-11-23       C    0.10   4.72   -9.27       677.93  4250 - 12/15/2017
52736 2017-11-23       P  684.20   1.43 -106.09       677.93  4250 - 12/15/2017
53144 2017-11-23       C    0.10   4.49   -9.37       727.93  4300 - 12/15/2017
53145 2017-11-23       P  734.20   0.69 -105.02       727.93  4300 - 12/15/2017
52738 2017-11-23       P  784.20   4.29   -9.46       777.93  4350 - 12/15/2017
52737 2017-11-23       C    0.10   4.29   -9.46       777.93  4350 - 12/15/2017
53081 2017-11-23       P  834.20   4.12   -9.55       827.93  4400 - 12/15/2017
53019 2017-11-23       C    0.10   4.12   -9.55       827.93  4400 - 12/15/2017
52747 2017-11-23       C    0.10   3.96   -9.64       877.93  4450 - 12/15/2017
52748 2017-11-23       P  884.20   3.96   -9.64       877.93  4450 - 12/15/2017
52605 2017-11-23       C    0.10   3.81   -9.71       927.93  4500 - 12/15/2017
52606 2017-11-23       P  934.20   3.81   -9.71       927.93  4500 - 12/15/2017
52753 2017-11-23       C    0.10   3.68   -9.79       977.93  4550 - 12/15/2017
52754 2017-11-23       P  984.30   2.04 -109.96       977.93  4550 - 12/15/2017
53020 2017-11-23       C    0.10   3.56   -9.86      1027.93  4600 - 12/15/2017
53082 2017-11-23       P 1034.30   1.55 -108.99      1027.93  4600 - 12/15/2017
54698 2017-11-23       P 1134.30   0.53 -106.79      1127.93  4700 - 12/15/2017
54687 2017-11-23       C    0.10   3.35   -9.99      1127.93  4700 - 12/15/2017
52337 2017-11-23       C    0.10   3.17  -10.11      1227.93  4800 - 12/15/2017
52338 2017-11-23       P 1234.30   3.17  -10.11      1227.93  4800 - 12/15/2017
54699 2017-11-23       P 1334.30   3.01  -10.22      1327.93  4900 - 12/15/2017
54688 2017-11-23       C    0.10   3.01  -10.22      1327.93  4900 - 12/15/2017
52191 2017-11-23       P    0.10   0.55  -11.15     -3072.07   500 - 12/15/2017
52190 2017-11-23       C 3066.80   0.29   82.60     -3072.07   500 - 12/15/2017
52339 2017-11-23       C    0.10   2.87  -10.32      1427.93  5000 - 12/15/2017
52340 2017-11-23       P 1434.40   1.26 -110.86      1427.93  5000 - 12/15/2017
54689 2017-11-23       C    0.10   2.75  -10.41      1527.93  5100 - 12/15/2017
54700 2017-11-23       P 1534.40   0.45 -108.55      1527.93  5100 - 12/15/2017
52341 2017-11-23       C    0.10   2.65  -10.50      1627.93  5200 - 12/15/2017
52342 2017-11-23       P 1634.40   2.65  -10.50      1627.93  5200 - 12/15/2017
52439 2017-11-23       C    0.10   2.55  -10.58      1727.93  5300 - 12/15/2017
52440 2017-11-23       P 1734.50   1.72 -114.79      1727.93  5300 - 12/15/2017
52343 2017-11-23       C    0.10   2.46  -10.66      1827.93  5400 - 12/15/2017
52344 2017-11-23       P 1834.50   1.08 -112.69      1827.93  5400 - 12/15/2017
54701 2017-11-23       P 1934.50   0.40 -110.30      1927.93  5500 - 12/15/2017
54690 2017-11-23       C    0.10   2.38  -10.73      1927.93  5500 - 12/15/2017
52346 2017-11-23       P 2034.50   2.31  -10.80      2027.93  5600 - 12/15/2017
52345 2017-11-23       C    0.10   2.31  -10.80      2027.93  5600 - 12/15/2017
54691 2017-11-23       C    0.10   2.24  -10.87      2127.93  5700 - 12/15/2017
54702 2017-11-23       P 2134.60   1.52 -116.68      2127.93  5700 - 12/15/2017
52348 2017-11-23       P 2234.60   0.97 -114.51      2227.93  5800 - 12/15/2017
52347 2017-11-23       C    0.10   2.18  -10.93      2227.93  5800 - 12/15/2017
54703 2017-11-23       P 2334.60   0.37 -112.06      2327.93  5900 - 12/15/2017
54692 2017-11-23       C    0.10   2.13  -10.99      2327.93  5900 - 12/15/2017
52192 2017-11-23       C 2966.80   0.46   80.38     -2972.07   600 - 12/15/2017
52193 2017-11-23       P    0.10   0.61  -11.16     -2972.07   600 - 12/15/2017
52349 2017-11-23       C    0.10   2.08  -11.05      2427.93  6000 - 12/15/2017
52350 2017-11-23       P 2434.60   2.08  -11.05      2427.93  6000 - 12/15/2017
52194 2017-11-23       C 2866.70   0.67  -11.16     -2872.07   700 - 12/15/2017
52195 2017-11-23       P    0.10   0.67  -11.16     -2872.07   700 - 12/15/2017
54449 2017-11-23       C    0.10   1.71  -11.52      3427.93  7000 - 12/15/2017
54479 2017-11-23       P 3434.90   0.77 -119.84      3427.93  7000 - 12/15/2017
57740 2017-11-24       C  787.75   7.77  -25.05      -781.23  2800 - 11/24/2017
57742 2017-11-24       P    0.01   7.77  -25.05      -781.23  2800 - 11/24/2017
57741 2017-11-24       C  737.75   7.77  -25.05      -731.23  2850 - 11/24/2017
57743 2017-11-24       P    0.01   7.77  -25.05      -731.23  2850 - 11/24/2017
57730 2017-11-24       C  687.75   7.77  -25.05      -681.23  2900 - 11/24/2017
57735 2017-11-24       P    0.01   7.77  -25.05      -681.23  2900 - 11/24/2017
57731 2017-11-24       C  637.75   7.77  -25.05      -631.23  2950 - 11/24/2017
57736 2017-11-24       P    0.01   7.77  -25.05      -631.23  2950 - 11/24/2017
57732 2017-11-24       C  587.75   7.77  -25.05      -581.23  3000 - 11/24/2017
57737 2017-11-24       P    0.01   7.77  -25.05      -581.23  3000 - 11/24/2017
57733 2017-11-24       C  537.75   7.77  -25.05      -531.23  3050 - 11/24/2017
57738 2017-11-24       P    0.01   7.77  -25.05      -531.23  3050 - 11/24/2017
57727 2017-11-24       P    0.20   7.77  -25.05      -431.23  3150 - 12/08/2017
57728 2017-11-24       P    0.30  11.49  -34.45      -381.23  3200 - 12/08/2017
57734 2017-11-24       C  362.75  14.84  -43.17      -356.23  3225 - 11/24/2017
57739 2017-11-24       P    0.01  14.84  -43.17      -356.23  3225 - 11/24/2017
57729 2017-11-24       P    0.40  14.84  -43.17      -356.23  3225 - 12/08/2017
57826 2017-11-24       C  234.50 140.14 -124.53      -231.23  3350 - 12/22/2017
57845 2017-11-24       P    5.70 140.19 -156.23      -231.23  3350 - 12/22/2017
57827 2017-11-24       C  210.50 160.38 -138.61      -206.23  3375 - 12/22/2017
57846 2017-11-24       P    6.70 160.34 -170.27      -206.23  3375 - 12/22/2017
57828 2017-11-24       C  186.80 184.35 -154.72      -181.23  3400 - 12/22/2017
57847 2017-11-24       P    8.10 185.20 -187.99      -181.23  3400 - 12/22/2017
57829 2017-11-24       C  163.60 213.17 -174.17      -156.23  3425 - 12/22/2017
57848 2017-11-24       P    9.80 213.01 -205.82      -156.23  3425 - 12/22/2017

是否有任何方法可以加快上述代码的速度或对为什么第二个代码不起作用的任何见解。

问题发生在第二个 bfill(它将为整个数据帧回填 nan,而不是每个子组),下面的内容对您有用

df.groupby(['code','Date']).apply(lambda x : x.ffill().bfill())

例如,我们通常认为这会return sum of sum for each group,但它会return one number .

df=pd.DataFrame({'A':[1,1,3,4],'B':[2,3,4,5]})
df.groupby('A').sum().sum()
Out[958]: 
B    14
dtype: int64