pandas groupby 未按预期工作
pandas groupby not working as expected
我有一个数据框:
>>> d6
Out[57]:
Date sym Last M1 M2 dist code
52735 2017-11-23 C 0.10 4.72 -9.27 677.93 4250 - 12/15/2017
52736 2017-11-23 P 684.20 1.43 -106.09 677.93 4250 - 12/15/2017
53144 2017-11-23 C 0.10 4.49 -9.37 727.93 4300 - 12/15/2017
53145 2017-11-23 P 734.20 0.69 -105.02 727.93 4300 - 12/15/2017
52738 2017-11-23 P 784.20 nan nan 777.93 4350 - 12/15/2017
52737 2017-11-23 C 0.10 4.29 -9.46 777.93 4350 - 12/15/2017
53081 2017-11-23 P 834.20 nan nan 827.93 4400 - 12/15/2017
53019 2017-11-23 C 0.10 4.12 -9.55 827.93 4400 - 12/15/2017
52747 2017-11-23 C 0.10 3.96 -9.64 877.93 4450 - 12/15/2017
52748 2017-11-23 P 884.20 nan nan 877.93 4450 - 12/15/2017
52605 2017-11-23 C 0.10 3.81 -9.71 927.93 4500 - 12/15/2017
52606 2017-11-23 P 934.20 nan nan 927.93 4500 - 12/15/2017
52753 2017-11-23 C 0.10 3.68 -9.79 977.93 4550 - 12/15/2017
52754 2017-11-23 P 984.30 2.04 -109.96 977.93 4550 - 12/15/2017
53020 2017-11-23 C 0.10 3.56 -9.86 1027.93 4600 - 12/15/2017
53082 2017-11-23 P 1034.30 1.55 -108.99 1027.93 4600 - 12/15/2017
54698 2017-11-23 P 1134.30 0.53 -106.79 1127.93 4700 - 12/15/2017
54687 2017-11-23 C 0.10 3.35 -9.99 1127.93 4700 - 12/15/2017
52337 2017-11-23 C 0.10 3.17 -10.11 1227.93 4800 - 12/15/2017
52338 2017-11-23 P 1234.30 nan nan 1227.93 4800 - 12/15/2017
54699 2017-11-23 P 1334.30 nan nan 1327.93 4900 - 12/15/2017
54688 2017-11-23 C 0.10 3.01 -10.22 1327.93 4900 - 12/15/2017
52191 2017-11-23 P 0.10 0.55 -11.15 -3072.07 500 - 12/15/2017
52190 2017-11-23 C 3066.80 0.29 82.60 -3072.07 500 - 12/15/2017
52339 2017-11-23 C 0.10 2.87 -10.32 1427.93 5000 - 12/15/2017
52340 2017-11-23 P 1434.40 1.26 -110.86 1427.93 5000 - 12/15/2017
54689 2017-11-23 C 0.10 2.75 -10.41 1527.93 5100 - 12/15/2017
54700 2017-11-23 P 1534.40 0.45 -108.55 1527.93 5100 - 12/15/2017
52341 2017-11-23 C 0.10 2.65 -10.50 1627.93 5200 - 12/15/2017
52342 2017-11-23 P 1634.40 nan nan 1627.93 5200 - 12/15/2017
52439 2017-11-23 C 0.10 2.55 -10.58 1727.93 5300 - 12/15/2017
52440 2017-11-23 P 1734.50 1.72 -114.79 1727.93 5300 - 12/15/2017
52343 2017-11-23 C 0.10 2.46 -10.66 1827.93 5400 - 12/15/2017
52344 2017-11-23 P 1834.50 1.08 -112.69 1827.93 5400 - 12/15/2017
54701 2017-11-23 P 1934.50 0.40 -110.30 1927.93 5500 - 12/15/2017
54690 2017-11-23 C 0.10 2.38 -10.73 1927.93 5500 - 12/15/2017
52346 2017-11-23 P 2034.50 nan nan 2027.93 5600 - 12/15/2017
52345 2017-11-23 C 0.10 2.31 -10.80 2027.93 5600 - 12/15/2017
54691 2017-11-23 C 0.10 2.24 -10.87 2127.93 5700 - 12/15/2017
54702 2017-11-23 P 2134.60 1.52 -116.68 2127.93 5700 - 12/15/2017
52348 2017-11-23 P 2234.60 0.97 -114.51 2227.93 5800 - 12/15/2017
52347 2017-11-23 C 0.10 2.18 -10.93 2227.93 5800 - 12/15/2017
54703 2017-11-23 P 2334.60 0.37 -112.06 2327.93 5900 - 12/15/2017
54692 2017-11-23 C 0.10 2.13 -10.99 2327.93 5900 - 12/15/2017
52192 2017-11-23 C 2966.80 0.46 80.38 -2972.07 600 - 12/15/2017
52193 2017-11-23 P 0.10 0.61 -11.16 -2972.07 600 - 12/15/2017
52349 2017-11-23 C 0.10 2.08 -11.05 2427.93 6000 - 12/15/2017
52350 2017-11-23 P 2434.60 nan nan 2427.93 6000 - 12/15/2017
52194 2017-11-23 C 2866.70 nan nan -2872.07 700 - 12/15/2017
52195 2017-11-23 P 0.10 0.67 -11.16 -2872.07 700 - 12/15/2017
54449 2017-11-23 C 0.10 1.71 -11.52 3427.93 7000 - 12/15/2017
54479 2017-11-23 P 3434.90 0.77 -119.84 3427.93 7000 - 12/15/2017
57740 2017-11-24 C 787.75 nan nan -781.23 2800 - 11/24/2017
57742 2017-11-24 P 0.01 nan nan -781.23 2800 - 11/24/2017
57741 2017-11-24 C 737.75 nan nan -731.23 2850 - 11/24/2017
57743 2017-11-24 P 0.01 nan nan -731.23 2850 - 11/24/2017
57730 2017-11-24 C 687.75 nan nan -681.23 2900 - 11/24/2017
57735 2017-11-24 P 0.01 nan nan -681.23 2900 - 11/24/2017
57731 2017-11-24 C 637.75 nan nan -631.23 2950 - 11/24/2017
57736 2017-11-24 P 0.01 nan nan -631.23 2950 - 11/24/2017
57732 2017-11-24 C 587.75 nan nan -581.23 3000 - 11/24/2017
57737 2017-11-24 P 0.01 nan nan -581.23 3000 - 11/24/2017
57733 2017-11-24 C 537.75 nan nan -531.23 3050 - 11/24/2017
57738 2017-11-24 P 0.01 nan nan -531.23 3050 - 11/24/2017
57727 2017-11-24 P 0.20 7.77 -25.05 -431.23 3150 - 12/08/2017
57728 2017-11-24 P 0.30 11.49 -34.45 -381.23 3200 - 12/08/2017
57734 2017-11-24 C 362.75 nan nan -356.23 3225 - 11/24/2017
57739 2017-11-24 P 0.01 nan nan -356.23 3225 - 11/24/2017
57729 2017-11-24 P 0.40 14.84 -43.17 -356.23 3225 - 12/08/2017
57826 2017-11-24 C 234.50 140.14 -124.53 -231.23 3350 - 12/22/2017
57845 2017-11-24 P 5.70 140.19 -156.23 -231.23 3350 - 12/22/2017
57827 2017-11-24 C 210.50 160.38 -138.61 -206.23 3375 - 12/22/2017
57846 2017-11-24 P 6.70 160.34 -170.27 -206.23 3375 - 12/22/2017
虽然我上面只显示了2个日期,但它有很多日期。每个日期都有几个 "codes" 的条目。给定日期的每个代码都有 2 个条目 - 一个用于符号 C,一个用于 P。如果我有 M1/M2 个 C 或 P 条目,我想用那个 code/day。如果对于给定的代码+天,C 和 P 都是 nan,我将其保留为 nan。
我目前是这样操作的:
for code in d1.code:
x_df = d1[d1.code == code]
x_df = x_df.groupby(['Date'], as_index=False).ffill().bfill()
d1[d1.code == code] = x_df
这可行,但需要很长时间。这是上面 df 的输出:
Out[62]:
Date sym Last M1 M2 dist code
52735 2017-11-23 C 0.10 4.72 -9.27 677.93 4250 - 12/15/2017
52736 2017-11-23 P 684.20 1.43 -106.09 677.93 4250 - 12/15/2017
53144 2017-11-23 C 0.10 4.49 -9.37 727.93 4300 - 12/15/2017
53145 2017-11-23 P 734.20 0.69 -105.02 727.93 4300 - 12/15/2017
52738 2017-11-23 P 784.20 4.29 -9.46 777.93 4350 - 12/15/2017
52737 2017-11-23 C 0.10 4.29 -9.46 777.93 4350 - 12/15/2017
53081 2017-11-23 P 834.20 4.12 -9.55 827.93 4400 - 12/15/2017
53019 2017-11-23 C 0.10 4.12 -9.55 827.93 4400 - 12/15/2017
52747 2017-11-23 C 0.10 3.96 -9.64 877.93 4450 - 12/15/2017
52748 2017-11-23 P 884.20 3.96 -9.64 877.93 4450 - 12/15/2017
52605 2017-11-23 C 0.10 3.81 -9.71 927.93 4500 - 12/15/2017
52606 2017-11-23 P 934.20 3.81 -9.71 927.93 4500 - 12/15/2017
52753 2017-11-23 C 0.10 3.68 -9.79 977.93 4550 - 12/15/2017
52754 2017-11-23 P 984.30 2.04 -109.96 977.93 4550 - 12/15/2017
53020 2017-11-23 C 0.10 3.56 -9.86 1027.93 4600 - 12/15/2017
53082 2017-11-23 P 1034.30 1.55 -108.99 1027.93 4600 - 12/15/2017
54698 2017-11-23 P 1134.30 0.53 -106.79 1127.93 4700 - 12/15/2017
54687 2017-11-23 C 0.10 3.35 -9.99 1127.93 4700 - 12/15/2017
52337 2017-11-23 C 0.10 3.17 -10.11 1227.93 4800 - 12/15/2017
52338 2017-11-23 P 1234.30 3.17 -10.11 1227.93 4800 - 12/15/2017
54699 2017-11-23 P 1334.30 3.01 -10.22 1327.93 4900 - 12/15/2017
54688 2017-11-23 C 0.10 3.01 -10.22 1327.93 4900 - 12/15/2017
52191 2017-11-23 P 0.10 0.55 -11.15 -3072.07 500 - 12/15/2017
52190 2017-11-23 C 3066.80 0.29 82.60 -3072.07 500 - 12/15/2017
52339 2017-11-23 C 0.10 2.87 -10.32 1427.93 5000 - 12/15/2017
52340 2017-11-23 P 1434.40 1.26 -110.86 1427.93 5000 - 12/15/2017
54689 2017-11-23 C 0.10 2.75 -10.41 1527.93 5100 - 12/15/2017
54700 2017-11-23 P 1534.40 0.45 -108.55 1527.93 5100 - 12/15/2017
52341 2017-11-23 C 0.10 2.65 -10.50 1627.93 5200 - 12/15/2017
52342 2017-11-23 P 1634.40 2.65 -10.50 1627.93 5200 - 12/15/2017
52439 2017-11-23 C 0.10 2.55 -10.58 1727.93 5300 - 12/15/2017
52440 2017-11-23 P 1734.50 1.72 -114.79 1727.93 5300 - 12/15/2017
52343 2017-11-23 C 0.10 2.46 -10.66 1827.93 5400 - 12/15/2017
52344 2017-11-23 P 1834.50 1.08 -112.69 1827.93 5400 - 12/15/2017
54701 2017-11-23 P 1934.50 0.40 -110.30 1927.93 5500 - 12/15/2017
54690 2017-11-23 C 0.10 2.38 -10.73 1927.93 5500 - 12/15/2017
52346 2017-11-23 P 2034.50 2.31 -10.80 2027.93 5600 - 12/15/2017
52345 2017-11-23 C 0.10 2.31 -10.80 2027.93 5600 - 12/15/2017
54691 2017-11-23 C 0.10 2.24 -10.87 2127.93 5700 - 12/15/2017
54702 2017-11-23 P 2134.60 1.52 -116.68 2127.93 5700 - 12/15/2017
52348 2017-11-23 P 2234.60 0.97 -114.51 2227.93 5800 - 12/15/2017
52347 2017-11-23 C 0.10 2.18 -10.93 2227.93 5800 - 12/15/2017
54703 2017-11-23 P 2334.60 0.37 -112.06 2327.93 5900 - 12/15/2017
54692 2017-11-23 C 0.10 2.13 -10.99 2327.93 5900 - 12/15/2017
52192 2017-11-23 C 2966.80 0.46 80.38 -2972.07 600 - 12/15/2017
52193 2017-11-23 P 0.10 0.61 -11.16 -2972.07 600 - 12/15/2017
52349 2017-11-23 C 0.10 2.08 -11.05 2427.93 6000 - 12/15/2017
52350 2017-11-23 P 2434.60 2.08 -11.05 2427.93 6000 - 12/15/2017
52194 2017-11-23 C 2866.70 0.67 -11.16 -2872.07 700 - 12/15/2017
52195 2017-11-23 P 0.10 0.67 -11.16 -2872.07 700 - 12/15/2017
54449 2017-11-23 C 0.10 1.71 -11.52 3427.93 7000 - 12/15/2017
54479 2017-11-23 P 3434.90 0.77 -119.84 3427.93 7000 - 12/15/2017
57740 2017-11-24 C 787.75 nan nan -781.23 2800 - 11/24/2017
57742 2017-11-24 P 0.01 nan nan -781.23 2800 - 11/24/2017
57741 2017-11-24 C 737.75 nan nan -731.23 2850 - 11/24/2017
57743 2017-11-24 P 0.01 nan nan -731.23 2850 - 11/24/2017
57730 2017-11-24 C 687.75 nan nan -681.23 2900 - 11/24/2017
57735 2017-11-24 P 0.01 nan nan -681.23 2900 - 11/24/2017
57731 2017-11-24 C 637.75 nan nan -631.23 2950 - 11/24/2017
57736 2017-11-24 P 0.01 nan nan -631.23 2950 - 11/24/2017
57732 2017-11-24 C 587.75 nan nan -581.23 3000 - 11/24/2017
57737 2017-11-24 P 0.01 nan nan -581.23 3000 - 11/24/2017
57733 2017-11-24 C 537.75 nan nan -531.23 3050 - 11/24/2017
57738 2017-11-24 P 0.01 nan nan -531.23 3050 - 11/24/2017
57727 2017-11-24 P 0.20 7.77 -25.05 -431.23 3150 - 12/08/2017
57728 2017-11-24 P 0.30 11.49 -34.45 -381.23 3200 - 12/08/2017
57734 2017-11-24 C 362.75 nan nan -356.23 3225 - 11/24/2017
57739 2017-11-24 P 0.01 nan nan -356.23 3225 - 11/24/2017
57729 2017-11-24 P 0.40 14.84 -43.17 -356.23 3225 - 12/08/2017
57826 2017-11-24 C 234.50 140.14 -124.53 -231.23 3350 - 12/22/2017
57845 2017-11-24 P 5.70 140.19 -156.23 -231.23 3350 - 12/22/2017
57827 2017-11-24 C 210.50 160.38 -138.61 -206.23 3375 - 12/22/2017
57846 2017-11-24 P 6.70 160.34 -170.27 -206.23 3375 - 12/22/2017
57828 2017-11-24 C 186.80 184.35 -154.72 -181.23 3400 - 12/22/2017
57847 2017-11-24 P 8.10 185.20 -187.99 -181.23 3400 - 12/22/2017
57829 2017-11-24 C 163.60 213.17 -174.17 -156.23 3425 - 12/22/2017
57848 2017-11-24 P 9.80 213.01 -205.82 -156.23 3425 - 12/22/2017
为了让它更快,我尝试了以下方法:
new_d1= d1.groupby(['code','Date'], as_index=False).ffill().bfill()
这没有按预期工作(因为上面的代码有效)。看起来好像我们只按日期分组而不是 "code"。这是输出:
>>> new_d1
Out[59]:
Date sym Last M1 M2 dist code
52735 2017-11-23 C 0.10 4.72 -9.27 677.93 4250 - 12/15/2017
52736 2017-11-23 P 684.20 1.43 -106.09 677.93 4250 - 12/15/2017
53144 2017-11-23 C 0.10 4.49 -9.37 727.93 4300 - 12/15/2017
53145 2017-11-23 P 734.20 0.69 -105.02 727.93 4300 - 12/15/2017
52738 2017-11-23 P 784.20 4.29 -9.46 777.93 4350 - 12/15/2017
52737 2017-11-23 C 0.10 4.29 -9.46 777.93 4350 - 12/15/2017
53081 2017-11-23 P 834.20 4.12 -9.55 827.93 4400 - 12/15/2017
53019 2017-11-23 C 0.10 4.12 -9.55 827.93 4400 - 12/15/2017
52747 2017-11-23 C 0.10 3.96 -9.64 877.93 4450 - 12/15/2017
52748 2017-11-23 P 884.20 3.96 -9.64 877.93 4450 - 12/15/2017
52605 2017-11-23 C 0.10 3.81 -9.71 927.93 4500 - 12/15/2017
52606 2017-11-23 P 934.20 3.81 -9.71 927.93 4500 - 12/15/2017
52753 2017-11-23 C 0.10 3.68 -9.79 977.93 4550 - 12/15/2017
52754 2017-11-23 P 984.30 2.04 -109.96 977.93 4550 - 12/15/2017
53020 2017-11-23 C 0.10 3.56 -9.86 1027.93 4600 - 12/15/2017
53082 2017-11-23 P 1034.30 1.55 -108.99 1027.93 4600 - 12/15/2017
54698 2017-11-23 P 1134.30 0.53 -106.79 1127.93 4700 - 12/15/2017
54687 2017-11-23 C 0.10 3.35 -9.99 1127.93 4700 - 12/15/2017
52337 2017-11-23 C 0.10 3.17 -10.11 1227.93 4800 - 12/15/2017
52338 2017-11-23 P 1234.30 3.17 -10.11 1227.93 4800 - 12/15/2017
54699 2017-11-23 P 1334.30 3.01 -10.22 1327.93 4900 - 12/15/2017
54688 2017-11-23 C 0.10 3.01 -10.22 1327.93 4900 - 12/15/2017
52191 2017-11-23 P 0.10 0.55 -11.15 -3072.07 500 - 12/15/2017
52190 2017-11-23 C 3066.80 0.29 82.60 -3072.07 500 - 12/15/2017
52339 2017-11-23 C 0.10 2.87 -10.32 1427.93 5000 - 12/15/2017
52340 2017-11-23 P 1434.40 1.26 -110.86 1427.93 5000 - 12/15/2017
54689 2017-11-23 C 0.10 2.75 -10.41 1527.93 5100 - 12/15/2017
54700 2017-11-23 P 1534.40 0.45 -108.55 1527.93 5100 - 12/15/2017
52341 2017-11-23 C 0.10 2.65 -10.50 1627.93 5200 - 12/15/2017
52342 2017-11-23 P 1634.40 2.65 -10.50 1627.93 5200 - 12/15/2017
52439 2017-11-23 C 0.10 2.55 -10.58 1727.93 5300 - 12/15/2017
52440 2017-11-23 P 1734.50 1.72 -114.79 1727.93 5300 - 12/15/2017
52343 2017-11-23 C 0.10 2.46 -10.66 1827.93 5400 - 12/15/2017
52344 2017-11-23 P 1834.50 1.08 -112.69 1827.93 5400 - 12/15/2017
54701 2017-11-23 P 1934.50 0.40 -110.30 1927.93 5500 - 12/15/2017
54690 2017-11-23 C 0.10 2.38 -10.73 1927.93 5500 - 12/15/2017
52346 2017-11-23 P 2034.50 2.31 -10.80 2027.93 5600 - 12/15/2017
52345 2017-11-23 C 0.10 2.31 -10.80 2027.93 5600 - 12/15/2017
54691 2017-11-23 C 0.10 2.24 -10.87 2127.93 5700 - 12/15/2017
54702 2017-11-23 P 2134.60 1.52 -116.68 2127.93 5700 - 12/15/2017
52348 2017-11-23 P 2234.60 0.97 -114.51 2227.93 5800 - 12/15/2017
52347 2017-11-23 C 0.10 2.18 -10.93 2227.93 5800 - 12/15/2017
54703 2017-11-23 P 2334.60 0.37 -112.06 2327.93 5900 - 12/15/2017
54692 2017-11-23 C 0.10 2.13 -10.99 2327.93 5900 - 12/15/2017
52192 2017-11-23 C 2966.80 0.46 80.38 -2972.07 600 - 12/15/2017
52193 2017-11-23 P 0.10 0.61 -11.16 -2972.07 600 - 12/15/2017
52349 2017-11-23 C 0.10 2.08 -11.05 2427.93 6000 - 12/15/2017
52350 2017-11-23 P 2434.60 2.08 -11.05 2427.93 6000 - 12/15/2017
52194 2017-11-23 C 2866.70 0.67 -11.16 -2872.07 700 - 12/15/2017
52195 2017-11-23 P 0.10 0.67 -11.16 -2872.07 700 - 12/15/2017
54449 2017-11-23 C 0.10 1.71 -11.52 3427.93 7000 - 12/15/2017
54479 2017-11-23 P 3434.90 0.77 -119.84 3427.93 7000 - 12/15/2017
57740 2017-11-24 C 787.75 7.77 -25.05 -781.23 2800 - 11/24/2017
57742 2017-11-24 P 0.01 7.77 -25.05 -781.23 2800 - 11/24/2017
57741 2017-11-24 C 737.75 7.77 -25.05 -731.23 2850 - 11/24/2017
57743 2017-11-24 P 0.01 7.77 -25.05 -731.23 2850 - 11/24/2017
57730 2017-11-24 C 687.75 7.77 -25.05 -681.23 2900 - 11/24/2017
57735 2017-11-24 P 0.01 7.77 -25.05 -681.23 2900 - 11/24/2017
57731 2017-11-24 C 637.75 7.77 -25.05 -631.23 2950 - 11/24/2017
57736 2017-11-24 P 0.01 7.77 -25.05 -631.23 2950 - 11/24/2017
57732 2017-11-24 C 587.75 7.77 -25.05 -581.23 3000 - 11/24/2017
57737 2017-11-24 P 0.01 7.77 -25.05 -581.23 3000 - 11/24/2017
57733 2017-11-24 C 537.75 7.77 -25.05 -531.23 3050 - 11/24/2017
57738 2017-11-24 P 0.01 7.77 -25.05 -531.23 3050 - 11/24/2017
57727 2017-11-24 P 0.20 7.77 -25.05 -431.23 3150 - 12/08/2017
57728 2017-11-24 P 0.30 11.49 -34.45 -381.23 3200 - 12/08/2017
57734 2017-11-24 C 362.75 14.84 -43.17 -356.23 3225 - 11/24/2017
57739 2017-11-24 P 0.01 14.84 -43.17 -356.23 3225 - 11/24/2017
57729 2017-11-24 P 0.40 14.84 -43.17 -356.23 3225 - 12/08/2017
57826 2017-11-24 C 234.50 140.14 -124.53 -231.23 3350 - 12/22/2017
57845 2017-11-24 P 5.70 140.19 -156.23 -231.23 3350 - 12/22/2017
57827 2017-11-24 C 210.50 160.38 -138.61 -206.23 3375 - 12/22/2017
57846 2017-11-24 P 6.70 160.34 -170.27 -206.23 3375 - 12/22/2017
57828 2017-11-24 C 186.80 184.35 -154.72 -181.23 3400 - 12/22/2017
57847 2017-11-24 P 8.10 185.20 -187.99 -181.23 3400 - 12/22/2017
57829 2017-11-24 C 163.60 213.17 -174.17 -156.23 3425 - 12/22/2017
57848 2017-11-24 P 9.80 213.01 -205.82 -156.23 3425 - 12/22/2017
是否有任何方法可以加快上述代码的速度或对为什么第二个代码不起作用的任何见解。
问题发生在第二个 bfill
(它将为整个数据帧回填 nan,而不是每个子组),下面的内容对您有用
df.groupby(['code','Date']).apply(lambda x : x.ffill().bfill())
例如,我们通常认为这会return sum of sum for each group,但它会return one number .
df=pd.DataFrame({'A':[1,1,3,4],'B':[2,3,4,5]})
df.groupby('A').sum().sum()
Out[958]:
B 14
dtype: int64
我有一个数据框:
>>> d6
Out[57]:
Date sym Last M1 M2 dist code
52735 2017-11-23 C 0.10 4.72 -9.27 677.93 4250 - 12/15/2017
52736 2017-11-23 P 684.20 1.43 -106.09 677.93 4250 - 12/15/2017
53144 2017-11-23 C 0.10 4.49 -9.37 727.93 4300 - 12/15/2017
53145 2017-11-23 P 734.20 0.69 -105.02 727.93 4300 - 12/15/2017
52738 2017-11-23 P 784.20 nan nan 777.93 4350 - 12/15/2017
52737 2017-11-23 C 0.10 4.29 -9.46 777.93 4350 - 12/15/2017
53081 2017-11-23 P 834.20 nan nan 827.93 4400 - 12/15/2017
53019 2017-11-23 C 0.10 4.12 -9.55 827.93 4400 - 12/15/2017
52747 2017-11-23 C 0.10 3.96 -9.64 877.93 4450 - 12/15/2017
52748 2017-11-23 P 884.20 nan nan 877.93 4450 - 12/15/2017
52605 2017-11-23 C 0.10 3.81 -9.71 927.93 4500 - 12/15/2017
52606 2017-11-23 P 934.20 nan nan 927.93 4500 - 12/15/2017
52753 2017-11-23 C 0.10 3.68 -9.79 977.93 4550 - 12/15/2017
52754 2017-11-23 P 984.30 2.04 -109.96 977.93 4550 - 12/15/2017
53020 2017-11-23 C 0.10 3.56 -9.86 1027.93 4600 - 12/15/2017
53082 2017-11-23 P 1034.30 1.55 -108.99 1027.93 4600 - 12/15/2017
54698 2017-11-23 P 1134.30 0.53 -106.79 1127.93 4700 - 12/15/2017
54687 2017-11-23 C 0.10 3.35 -9.99 1127.93 4700 - 12/15/2017
52337 2017-11-23 C 0.10 3.17 -10.11 1227.93 4800 - 12/15/2017
52338 2017-11-23 P 1234.30 nan nan 1227.93 4800 - 12/15/2017
54699 2017-11-23 P 1334.30 nan nan 1327.93 4900 - 12/15/2017
54688 2017-11-23 C 0.10 3.01 -10.22 1327.93 4900 - 12/15/2017
52191 2017-11-23 P 0.10 0.55 -11.15 -3072.07 500 - 12/15/2017
52190 2017-11-23 C 3066.80 0.29 82.60 -3072.07 500 - 12/15/2017
52339 2017-11-23 C 0.10 2.87 -10.32 1427.93 5000 - 12/15/2017
52340 2017-11-23 P 1434.40 1.26 -110.86 1427.93 5000 - 12/15/2017
54689 2017-11-23 C 0.10 2.75 -10.41 1527.93 5100 - 12/15/2017
54700 2017-11-23 P 1534.40 0.45 -108.55 1527.93 5100 - 12/15/2017
52341 2017-11-23 C 0.10 2.65 -10.50 1627.93 5200 - 12/15/2017
52342 2017-11-23 P 1634.40 nan nan 1627.93 5200 - 12/15/2017
52439 2017-11-23 C 0.10 2.55 -10.58 1727.93 5300 - 12/15/2017
52440 2017-11-23 P 1734.50 1.72 -114.79 1727.93 5300 - 12/15/2017
52343 2017-11-23 C 0.10 2.46 -10.66 1827.93 5400 - 12/15/2017
52344 2017-11-23 P 1834.50 1.08 -112.69 1827.93 5400 - 12/15/2017
54701 2017-11-23 P 1934.50 0.40 -110.30 1927.93 5500 - 12/15/2017
54690 2017-11-23 C 0.10 2.38 -10.73 1927.93 5500 - 12/15/2017
52346 2017-11-23 P 2034.50 nan nan 2027.93 5600 - 12/15/2017
52345 2017-11-23 C 0.10 2.31 -10.80 2027.93 5600 - 12/15/2017
54691 2017-11-23 C 0.10 2.24 -10.87 2127.93 5700 - 12/15/2017
54702 2017-11-23 P 2134.60 1.52 -116.68 2127.93 5700 - 12/15/2017
52348 2017-11-23 P 2234.60 0.97 -114.51 2227.93 5800 - 12/15/2017
52347 2017-11-23 C 0.10 2.18 -10.93 2227.93 5800 - 12/15/2017
54703 2017-11-23 P 2334.60 0.37 -112.06 2327.93 5900 - 12/15/2017
54692 2017-11-23 C 0.10 2.13 -10.99 2327.93 5900 - 12/15/2017
52192 2017-11-23 C 2966.80 0.46 80.38 -2972.07 600 - 12/15/2017
52193 2017-11-23 P 0.10 0.61 -11.16 -2972.07 600 - 12/15/2017
52349 2017-11-23 C 0.10 2.08 -11.05 2427.93 6000 - 12/15/2017
52350 2017-11-23 P 2434.60 nan nan 2427.93 6000 - 12/15/2017
52194 2017-11-23 C 2866.70 nan nan -2872.07 700 - 12/15/2017
52195 2017-11-23 P 0.10 0.67 -11.16 -2872.07 700 - 12/15/2017
54449 2017-11-23 C 0.10 1.71 -11.52 3427.93 7000 - 12/15/2017
54479 2017-11-23 P 3434.90 0.77 -119.84 3427.93 7000 - 12/15/2017
57740 2017-11-24 C 787.75 nan nan -781.23 2800 - 11/24/2017
57742 2017-11-24 P 0.01 nan nan -781.23 2800 - 11/24/2017
57741 2017-11-24 C 737.75 nan nan -731.23 2850 - 11/24/2017
57743 2017-11-24 P 0.01 nan nan -731.23 2850 - 11/24/2017
57730 2017-11-24 C 687.75 nan nan -681.23 2900 - 11/24/2017
57735 2017-11-24 P 0.01 nan nan -681.23 2900 - 11/24/2017
57731 2017-11-24 C 637.75 nan nan -631.23 2950 - 11/24/2017
57736 2017-11-24 P 0.01 nan nan -631.23 2950 - 11/24/2017
57732 2017-11-24 C 587.75 nan nan -581.23 3000 - 11/24/2017
57737 2017-11-24 P 0.01 nan nan -581.23 3000 - 11/24/2017
57733 2017-11-24 C 537.75 nan nan -531.23 3050 - 11/24/2017
57738 2017-11-24 P 0.01 nan nan -531.23 3050 - 11/24/2017
57727 2017-11-24 P 0.20 7.77 -25.05 -431.23 3150 - 12/08/2017
57728 2017-11-24 P 0.30 11.49 -34.45 -381.23 3200 - 12/08/2017
57734 2017-11-24 C 362.75 nan nan -356.23 3225 - 11/24/2017
57739 2017-11-24 P 0.01 nan nan -356.23 3225 - 11/24/2017
57729 2017-11-24 P 0.40 14.84 -43.17 -356.23 3225 - 12/08/2017
57826 2017-11-24 C 234.50 140.14 -124.53 -231.23 3350 - 12/22/2017
57845 2017-11-24 P 5.70 140.19 -156.23 -231.23 3350 - 12/22/2017
57827 2017-11-24 C 210.50 160.38 -138.61 -206.23 3375 - 12/22/2017
57846 2017-11-24 P 6.70 160.34 -170.27 -206.23 3375 - 12/22/2017
虽然我上面只显示了2个日期,但它有很多日期。每个日期都有几个 "codes" 的条目。给定日期的每个代码都有 2 个条目 - 一个用于符号 C,一个用于 P。如果我有 M1/M2 个 C 或 P 条目,我想用那个 code/day。如果对于给定的代码+天,C 和 P 都是 nan,我将其保留为 nan。
我目前是这样操作的:
for code in d1.code:
x_df = d1[d1.code == code]
x_df = x_df.groupby(['Date'], as_index=False).ffill().bfill()
d1[d1.code == code] = x_df
这可行,但需要很长时间。这是上面 df 的输出:
Out[62]:
Date sym Last M1 M2 dist code
52735 2017-11-23 C 0.10 4.72 -9.27 677.93 4250 - 12/15/2017
52736 2017-11-23 P 684.20 1.43 -106.09 677.93 4250 - 12/15/2017
53144 2017-11-23 C 0.10 4.49 -9.37 727.93 4300 - 12/15/2017
53145 2017-11-23 P 734.20 0.69 -105.02 727.93 4300 - 12/15/2017
52738 2017-11-23 P 784.20 4.29 -9.46 777.93 4350 - 12/15/2017
52737 2017-11-23 C 0.10 4.29 -9.46 777.93 4350 - 12/15/2017
53081 2017-11-23 P 834.20 4.12 -9.55 827.93 4400 - 12/15/2017
53019 2017-11-23 C 0.10 4.12 -9.55 827.93 4400 - 12/15/2017
52747 2017-11-23 C 0.10 3.96 -9.64 877.93 4450 - 12/15/2017
52748 2017-11-23 P 884.20 3.96 -9.64 877.93 4450 - 12/15/2017
52605 2017-11-23 C 0.10 3.81 -9.71 927.93 4500 - 12/15/2017
52606 2017-11-23 P 934.20 3.81 -9.71 927.93 4500 - 12/15/2017
52753 2017-11-23 C 0.10 3.68 -9.79 977.93 4550 - 12/15/2017
52754 2017-11-23 P 984.30 2.04 -109.96 977.93 4550 - 12/15/2017
53020 2017-11-23 C 0.10 3.56 -9.86 1027.93 4600 - 12/15/2017
53082 2017-11-23 P 1034.30 1.55 -108.99 1027.93 4600 - 12/15/2017
54698 2017-11-23 P 1134.30 0.53 -106.79 1127.93 4700 - 12/15/2017
54687 2017-11-23 C 0.10 3.35 -9.99 1127.93 4700 - 12/15/2017
52337 2017-11-23 C 0.10 3.17 -10.11 1227.93 4800 - 12/15/2017
52338 2017-11-23 P 1234.30 3.17 -10.11 1227.93 4800 - 12/15/2017
54699 2017-11-23 P 1334.30 3.01 -10.22 1327.93 4900 - 12/15/2017
54688 2017-11-23 C 0.10 3.01 -10.22 1327.93 4900 - 12/15/2017
52191 2017-11-23 P 0.10 0.55 -11.15 -3072.07 500 - 12/15/2017
52190 2017-11-23 C 3066.80 0.29 82.60 -3072.07 500 - 12/15/2017
52339 2017-11-23 C 0.10 2.87 -10.32 1427.93 5000 - 12/15/2017
52340 2017-11-23 P 1434.40 1.26 -110.86 1427.93 5000 - 12/15/2017
54689 2017-11-23 C 0.10 2.75 -10.41 1527.93 5100 - 12/15/2017
54700 2017-11-23 P 1534.40 0.45 -108.55 1527.93 5100 - 12/15/2017
52341 2017-11-23 C 0.10 2.65 -10.50 1627.93 5200 - 12/15/2017
52342 2017-11-23 P 1634.40 2.65 -10.50 1627.93 5200 - 12/15/2017
52439 2017-11-23 C 0.10 2.55 -10.58 1727.93 5300 - 12/15/2017
52440 2017-11-23 P 1734.50 1.72 -114.79 1727.93 5300 - 12/15/2017
52343 2017-11-23 C 0.10 2.46 -10.66 1827.93 5400 - 12/15/2017
52344 2017-11-23 P 1834.50 1.08 -112.69 1827.93 5400 - 12/15/2017
54701 2017-11-23 P 1934.50 0.40 -110.30 1927.93 5500 - 12/15/2017
54690 2017-11-23 C 0.10 2.38 -10.73 1927.93 5500 - 12/15/2017
52346 2017-11-23 P 2034.50 2.31 -10.80 2027.93 5600 - 12/15/2017
52345 2017-11-23 C 0.10 2.31 -10.80 2027.93 5600 - 12/15/2017
54691 2017-11-23 C 0.10 2.24 -10.87 2127.93 5700 - 12/15/2017
54702 2017-11-23 P 2134.60 1.52 -116.68 2127.93 5700 - 12/15/2017
52348 2017-11-23 P 2234.60 0.97 -114.51 2227.93 5800 - 12/15/2017
52347 2017-11-23 C 0.10 2.18 -10.93 2227.93 5800 - 12/15/2017
54703 2017-11-23 P 2334.60 0.37 -112.06 2327.93 5900 - 12/15/2017
54692 2017-11-23 C 0.10 2.13 -10.99 2327.93 5900 - 12/15/2017
52192 2017-11-23 C 2966.80 0.46 80.38 -2972.07 600 - 12/15/2017
52193 2017-11-23 P 0.10 0.61 -11.16 -2972.07 600 - 12/15/2017
52349 2017-11-23 C 0.10 2.08 -11.05 2427.93 6000 - 12/15/2017
52350 2017-11-23 P 2434.60 2.08 -11.05 2427.93 6000 - 12/15/2017
52194 2017-11-23 C 2866.70 0.67 -11.16 -2872.07 700 - 12/15/2017
52195 2017-11-23 P 0.10 0.67 -11.16 -2872.07 700 - 12/15/2017
54449 2017-11-23 C 0.10 1.71 -11.52 3427.93 7000 - 12/15/2017
54479 2017-11-23 P 3434.90 0.77 -119.84 3427.93 7000 - 12/15/2017
57740 2017-11-24 C 787.75 nan nan -781.23 2800 - 11/24/2017
57742 2017-11-24 P 0.01 nan nan -781.23 2800 - 11/24/2017
57741 2017-11-24 C 737.75 nan nan -731.23 2850 - 11/24/2017
57743 2017-11-24 P 0.01 nan nan -731.23 2850 - 11/24/2017
57730 2017-11-24 C 687.75 nan nan -681.23 2900 - 11/24/2017
57735 2017-11-24 P 0.01 nan nan -681.23 2900 - 11/24/2017
57731 2017-11-24 C 637.75 nan nan -631.23 2950 - 11/24/2017
57736 2017-11-24 P 0.01 nan nan -631.23 2950 - 11/24/2017
57732 2017-11-24 C 587.75 nan nan -581.23 3000 - 11/24/2017
57737 2017-11-24 P 0.01 nan nan -581.23 3000 - 11/24/2017
57733 2017-11-24 C 537.75 nan nan -531.23 3050 - 11/24/2017
57738 2017-11-24 P 0.01 nan nan -531.23 3050 - 11/24/2017
57727 2017-11-24 P 0.20 7.77 -25.05 -431.23 3150 - 12/08/2017
57728 2017-11-24 P 0.30 11.49 -34.45 -381.23 3200 - 12/08/2017
57734 2017-11-24 C 362.75 nan nan -356.23 3225 - 11/24/2017
57739 2017-11-24 P 0.01 nan nan -356.23 3225 - 11/24/2017
57729 2017-11-24 P 0.40 14.84 -43.17 -356.23 3225 - 12/08/2017
57826 2017-11-24 C 234.50 140.14 -124.53 -231.23 3350 - 12/22/2017
57845 2017-11-24 P 5.70 140.19 -156.23 -231.23 3350 - 12/22/2017
57827 2017-11-24 C 210.50 160.38 -138.61 -206.23 3375 - 12/22/2017
57846 2017-11-24 P 6.70 160.34 -170.27 -206.23 3375 - 12/22/2017
57828 2017-11-24 C 186.80 184.35 -154.72 -181.23 3400 - 12/22/2017
57847 2017-11-24 P 8.10 185.20 -187.99 -181.23 3400 - 12/22/2017
57829 2017-11-24 C 163.60 213.17 -174.17 -156.23 3425 - 12/22/2017
57848 2017-11-24 P 9.80 213.01 -205.82 -156.23 3425 - 12/22/2017
为了让它更快,我尝试了以下方法:
new_d1= d1.groupby(['code','Date'], as_index=False).ffill().bfill()
这没有按预期工作(因为上面的代码有效)。看起来好像我们只按日期分组而不是 "code"。这是输出:
>>> new_d1
Out[59]:
Date sym Last M1 M2 dist code
52735 2017-11-23 C 0.10 4.72 -9.27 677.93 4250 - 12/15/2017
52736 2017-11-23 P 684.20 1.43 -106.09 677.93 4250 - 12/15/2017
53144 2017-11-23 C 0.10 4.49 -9.37 727.93 4300 - 12/15/2017
53145 2017-11-23 P 734.20 0.69 -105.02 727.93 4300 - 12/15/2017
52738 2017-11-23 P 784.20 4.29 -9.46 777.93 4350 - 12/15/2017
52737 2017-11-23 C 0.10 4.29 -9.46 777.93 4350 - 12/15/2017
53081 2017-11-23 P 834.20 4.12 -9.55 827.93 4400 - 12/15/2017
53019 2017-11-23 C 0.10 4.12 -9.55 827.93 4400 - 12/15/2017
52747 2017-11-23 C 0.10 3.96 -9.64 877.93 4450 - 12/15/2017
52748 2017-11-23 P 884.20 3.96 -9.64 877.93 4450 - 12/15/2017
52605 2017-11-23 C 0.10 3.81 -9.71 927.93 4500 - 12/15/2017
52606 2017-11-23 P 934.20 3.81 -9.71 927.93 4500 - 12/15/2017
52753 2017-11-23 C 0.10 3.68 -9.79 977.93 4550 - 12/15/2017
52754 2017-11-23 P 984.30 2.04 -109.96 977.93 4550 - 12/15/2017
53020 2017-11-23 C 0.10 3.56 -9.86 1027.93 4600 - 12/15/2017
53082 2017-11-23 P 1034.30 1.55 -108.99 1027.93 4600 - 12/15/2017
54698 2017-11-23 P 1134.30 0.53 -106.79 1127.93 4700 - 12/15/2017
54687 2017-11-23 C 0.10 3.35 -9.99 1127.93 4700 - 12/15/2017
52337 2017-11-23 C 0.10 3.17 -10.11 1227.93 4800 - 12/15/2017
52338 2017-11-23 P 1234.30 3.17 -10.11 1227.93 4800 - 12/15/2017
54699 2017-11-23 P 1334.30 3.01 -10.22 1327.93 4900 - 12/15/2017
54688 2017-11-23 C 0.10 3.01 -10.22 1327.93 4900 - 12/15/2017
52191 2017-11-23 P 0.10 0.55 -11.15 -3072.07 500 - 12/15/2017
52190 2017-11-23 C 3066.80 0.29 82.60 -3072.07 500 - 12/15/2017
52339 2017-11-23 C 0.10 2.87 -10.32 1427.93 5000 - 12/15/2017
52340 2017-11-23 P 1434.40 1.26 -110.86 1427.93 5000 - 12/15/2017
54689 2017-11-23 C 0.10 2.75 -10.41 1527.93 5100 - 12/15/2017
54700 2017-11-23 P 1534.40 0.45 -108.55 1527.93 5100 - 12/15/2017
52341 2017-11-23 C 0.10 2.65 -10.50 1627.93 5200 - 12/15/2017
52342 2017-11-23 P 1634.40 2.65 -10.50 1627.93 5200 - 12/15/2017
52439 2017-11-23 C 0.10 2.55 -10.58 1727.93 5300 - 12/15/2017
52440 2017-11-23 P 1734.50 1.72 -114.79 1727.93 5300 - 12/15/2017
52343 2017-11-23 C 0.10 2.46 -10.66 1827.93 5400 - 12/15/2017
52344 2017-11-23 P 1834.50 1.08 -112.69 1827.93 5400 - 12/15/2017
54701 2017-11-23 P 1934.50 0.40 -110.30 1927.93 5500 - 12/15/2017
54690 2017-11-23 C 0.10 2.38 -10.73 1927.93 5500 - 12/15/2017
52346 2017-11-23 P 2034.50 2.31 -10.80 2027.93 5600 - 12/15/2017
52345 2017-11-23 C 0.10 2.31 -10.80 2027.93 5600 - 12/15/2017
54691 2017-11-23 C 0.10 2.24 -10.87 2127.93 5700 - 12/15/2017
54702 2017-11-23 P 2134.60 1.52 -116.68 2127.93 5700 - 12/15/2017
52348 2017-11-23 P 2234.60 0.97 -114.51 2227.93 5800 - 12/15/2017
52347 2017-11-23 C 0.10 2.18 -10.93 2227.93 5800 - 12/15/2017
54703 2017-11-23 P 2334.60 0.37 -112.06 2327.93 5900 - 12/15/2017
54692 2017-11-23 C 0.10 2.13 -10.99 2327.93 5900 - 12/15/2017
52192 2017-11-23 C 2966.80 0.46 80.38 -2972.07 600 - 12/15/2017
52193 2017-11-23 P 0.10 0.61 -11.16 -2972.07 600 - 12/15/2017
52349 2017-11-23 C 0.10 2.08 -11.05 2427.93 6000 - 12/15/2017
52350 2017-11-23 P 2434.60 2.08 -11.05 2427.93 6000 - 12/15/2017
52194 2017-11-23 C 2866.70 0.67 -11.16 -2872.07 700 - 12/15/2017
52195 2017-11-23 P 0.10 0.67 -11.16 -2872.07 700 - 12/15/2017
54449 2017-11-23 C 0.10 1.71 -11.52 3427.93 7000 - 12/15/2017
54479 2017-11-23 P 3434.90 0.77 -119.84 3427.93 7000 - 12/15/2017
57740 2017-11-24 C 787.75 7.77 -25.05 -781.23 2800 - 11/24/2017
57742 2017-11-24 P 0.01 7.77 -25.05 -781.23 2800 - 11/24/2017
57741 2017-11-24 C 737.75 7.77 -25.05 -731.23 2850 - 11/24/2017
57743 2017-11-24 P 0.01 7.77 -25.05 -731.23 2850 - 11/24/2017
57730 2017-11-24 C 687.75 7.77 -25.05 -681.23 2900 - 11/24/2017
57735 2017-11-24 P 0.01 7.77 -25.05 -681.23 2900 - 11/24/2017
57731 2017-11-24 C 637.75 7.77 -25.05 -631.23 2950 - 11/24/2017
57736 2017-11-24 P 0.01 7.77 -25.05 -631.23 2950 - 11/24/2017
57732 2017-11-24 C 587.75 7.77 -25.05 -581.23 3000 - 11/24/2017
57737 2017-11-24 P 0.01 7.77 -25.05 -581.23 3000 - 11/24/2017
57733 2017-11-24 C 537.75 7.77 -25.05 -531.23 3050 - 11/24/2017
57738 2017-11-24 P 0.01 7.77 -25.05 -531.23 3050 - 11/24/2017
57727 2017-11-24 P 0.20 7.77 -25.05 -431.23 3150 - 12/08/2017
57728 2017-11-24 P 0.30 11.49 -34.45 -381.23 3200 - 12/08/2017
57734 2017-11-24 C 362.75 14.84 -43.17 -356.23 3225 - 11/24/2017
57739 2017-11-24 P 0.01 14.84 -43.17 -356.23 3225 - 11/24/2017
57729 2017-11-24 P 0.40 14.84 -43.17 -356.23 3225 - 12/08/2017
57826 2017-11-24 C 234.50 140.14 -124.53 -231.23 3350 - 12/22/2017
57845 2017-11-24 P 5.70 140.19 -156.23 -231.23 3350 - 12/22/2017
57827 2017-11-24 C 210.50 160.38 -138.61 -206.23 3375 - 12/22/2017
57846 2017-11-24 P 6.70 160.34 -170.27 -206.23 3375 - 12/22/2017
57828 2017-11-24 C 186.80 184.35 -154.72 -181.23 3400 - 12/22/2017
57847 2017-11-24 P 8.10 185.20 -187.99 -181.23 3400 - 12/22/2017
57829 2017-11-24 C 163.60 213.17 -174.17 -156.23 3425 - 12/22/2017
57848 2017-11-24 P 9.80 213.01 -205.82 -156.23 3425 - 12/22/2017
是否有任何方法可以加快上述代码的速度或对为什么第二个代码不起作用的任何见解。
问题发生在第二个 bfill
(它将为整个数据帧回填 nan,而不是每个子组),下面的内容对您有用
df.groupby(['code','Date']).apply(lambda x : x.ffill().bfill())
例如,我们通常认为这会return sum of sum for each group,但它会return one number .
df=pd.DataFrame({'A':[1,1,3,4],'B':[2,3,4,5]})
df.groupby('A').sum().sum()
Out[958]:
B 14
dtype: int64