每 12 个值计算中位数

compute median every 12 values

ex_array = [-8.23294593e-02, -4.07239507e-02,  6.08131029e-02,  2.72433402e-02,
   -4.73587631e-02,  5.15452252e-02,  1.32902476e-01,  1.22322232e-01,
    2.71845990e-02, -1.16927038e-01, -2.62239877e-01, -1.46526396e-01,
   -1.82859136e-01, -1.02089602e-01, -1.91863501e-04, -5.42572200e-02,
   -1.41798506e-01,  2.32538185e-02,  1.44525705e-01,  1.33945461e-01,
    5.01618120e-02, -1.32664337e-01, -2.97395262e-01, -1.02531532e-01,
   -7.80204566e-02, -5.46991495e-02,  1.05868862e-01,  7.25526818e-03,
    5.04192997e-02,  7.41281286e-02,  1.75069159e-01,  1.64488914e-01,
    7.55396024e-02, -6.23800645e-02, -1.76950023e-01, -5.91491004e-02,
   -4.00535768e-02,  6.59473071e-04,  5.98125666e-02, -1.49608356e-02,
   -1.45519585e-02,  1.49876707e-01,  1.92880709e-01,  2.33158881e-01,
    7.59751625e-02, -2.46659059e-02, -1.40025102e-01, -3.02416639e-02]

需要计算每 12 个值的中位数。每个值代表一个月(从一月到十二月),所以我想获得一年中每个月的中位数。像这样:

方法:

'''

sol_array = []

sol_array.append(pd.DataFrame(ex_array).iloc[0::12].median().to_string())
sol_array.append(pd.DataFrame(ex_array).iloc[1::12].median().to_string())
sol_array.append(pd.DataFrame(ex_array).iloc[2::12].median().to_string())

但这是结果。 0 和撇号不应该在那里。

['0   -0.075844',
 '0   -0.089111',
 '0    0.042705',
 '0    0.002147',
 '0   -0.010528',
 '0    0.109443',
 '0    0.198334',
 '0    0.20983',
 '0    0.075139',
 '0   -0.062405']

所以,你知道另一种获得相同结果的方法。我只有120个值,所以手动排列组还是可行的(只有10组)但我觉得这不是一个理想的解决方案。

或者,你知道如何修正上面的方法一,得到一个可行的数组吗?

其他几个选择:

list(pd.DataFrame(ex_array).groupby(lambda i:i%12).median()[0])

import statistics
[statistics.median(ex_array[i] for i in range(j, len(ex_array), 12)) for j in range(12)]

在这两种情况下,输出(对于您问题中的数据)都是

[
 -0.08017495795, -0.0477115501, 0.06031283475, -0.00385278371,
 -0.030955360799999998, 0.0628366769, 0.15979743200000002, 0.14921718750000001,
 0.0628507072, -0.08965355124999999, -0.21959495, -0.0808403162
]

出于兴趣,我使用如下代码(针对每个版本进行了调整)对三个备选方案(包括@Shubham 回答)进行了计时:

import timeit
timeit.timeit(setup='''
import statistics
ex_array = [-8.23294593e-02, -4.07239507e-02,  6.08131029e-02,  2.72433402e-02,
   -4.73587631e-02,  5.15452252e-02,  1.32902476e-01,  1.22322232e-01,
    2.71845990e-02, -1.16927038e-01, -2.62239877e-01, -1.46526396e-01,
   -1.82859136e-01, -1.02089602e-01, -1.91863501e-04, -5.42572200e-02,
   -1.41798506e-01,  2.32538185e-02,  1.44525705e-01,  1.33945461e-01,
    5.01618120e-02, -1.32664337e-01, -2.97395262e-01, -1.02531532e-01,
   -7.80204566e-02, -5.46991495e-02,  1.05868862e-01,  7.25526818e-03,
    5.04192997e-02,  7.41281286e-02,  1.75069159e-01,  1.64488914e-01,
    7.55396024e-02, -6.23800645e-02, -1.76950023e-01, -5.91491004e-02,
   -4.00535768e-02,  6.59473071e-04,  5.98125666e-02, -1.49608356e-02,
   -1.45519585e-02,  1.49876707e-01,  1.92880709e-01,  2.33158881e-01,
    7.59751625e-02, -2.46659059e-02, -1.40025102e-01, -3.02416639e-02]
''',
stmt='''
[statistics.median(ex_array[i] for i in range(j, len(ex_array), 12)) for j in range(12)]
''',
number=10000
)

毫不奇怪,对于问题中的数据,0.39 秒的 numpy 解决方案比 pandas 解决方案(5.45 秒)快 10 倍以上。然而,统计解决方案在 0.09 秒时比 numpy 快 4 倍以上。随着数组变大,这个优势就消失了,但是 break-even 点大约有 5,000 个条目。

让我们使用numpy操作:

np.median(np.reshape(ex_array, (12, -1), 'F'), axis=1)

array([-0.08017496, -0.04771155,  0.06031283, -0.00385278, -0.03095536,
        0.06283668,  0.15979743,  0.14921719,  0.06285071, -0.08965355,
       -0.21959495, -0.08084032])