如何有效地获得许多四分位数?

How to efficiently get many quartiles?

我需要按范围对数值进行编码:低:0,中:1,高:2,非常高:3。我正在为四分位数进行编码。我有以下代码:

import pandas as pd
import numpy as np

def fun(df):
    table = df.copy() # pandas dataframe
    N = int(table.shape[0])
    for header in list(table.columns):
        q1 = np.percentile(table[header], 25)
        q2 = np.percentile(table[header], 50)
        q3 = np.percentile(table[header], 75)
        for k in range(0, N):
            if( table[header][k] < q1 ):
                table[header][k] = int(0)
            elif( (table[header][k] >= q1) & (table[header][k] < q2)):
                table[header][k] = int(1)
            elif( (table[header][k] >= q2) & (table[header][k] < q3)):
                table[header][k] = int(2)
            else:
                table[header][k] = int(3)
        pass
    table = table.astype(int)
    return table

证明

df = pd.DataFrame( {
        'A': [30, 28, 32, 25, 25, 25, 22, 24, 35, 40],
        'B': [25, 30, 27, 40, 42, 40, 50, 45, 30, 25],
        'C': [25.5, 30.1, 27.3, 40.77, 25.1, 25.34, 22.11, 23.81, 33.66, 38.56],
    }, columns = [ 'A', 'B', 'C' ] )

结果:

A  B  C
2  0  1
2  1  2
3  0  2
1  2  3
1  3  0
1  2  1
0  3  0
0  3  0
3  1  3
3  0  3

有什么方法可以有效地做到这一点?

您可以结合使用 np.digitizepd.rank

In [569]: np.digitize(df.rank(pct=True), bins=[.25, .5, .75], right=True)
Out[569]:
array([[2, 0, 1],
       [2, 1, 2],
       [3, 1, 2],
       [1, 2, 3],
       [1, 3, 1],
       [1, 2, 1],
       [0, 3, 0],
       [0, 3, 0],
       [3, 1, 3],
       [3, 0, 3]], dtype=int64)

详情

In [570]: df.rank(pct=True)
Out[570]:
     A     B    C
0  0.7  0.15  0.5
1  0.6  0.45  0.7
2  0.8  0.30  0.6
3  0.4  0.65  1.0
4  0.4  0.80  0.3
5  0.4  0.65  0.4
6  0.1  1.00  0.1
7  0.2  0.90  0.2
8  0.9  0.45  0.8
9  1.0  0.15  0.9

In [571]: pd.DataFrame(np.digitize(df.rank(pct=True), bins=[.25, .5, .75], right=True),
                       columns=df.columns)
Out[571]:
   A  B  C
0  2  0  1
1  2  1  2
2  3  1  2
3  1  2  3
4  1  3  1
5  1  2  1
6  0  3  0
7  0  3  0
8  3  1  3
9  3  0  3