在 pandas 中添加分位数作为新列

Add quantile number as a new column in pandas

我有一个包含三列的数据框

|一个 |乙 | C |

我计算了分位数:

df.quantile(.25)
df.quantile(.75)

我想添加一个新列 Q 根据简单规则使用 'small', 'medium', 'large' 进行分类。如果值小于 1 个四分位数,则较小;如果大于 3 个四分位数,则为大,介于两者之间的为中等。

我试过使用 qcut,但它只接收一维输入。

谢谢

pd.qcut是你的朋友。

pd.qcut(s, q=[0, .25, .75, 1], labels=['small', 'medium', 'large'])

MWE

print(s)
0     1
1     1
2     2
3     3
4     4
5     2
6     4
7     6
8     4
9     6
10    5
11    4
12    6
13    7
14    3
15    2
16    1
17    1
18    2
dtype: int64

print (pd.qcut(s, q=[0, .25, .75, 1], labels=['small', 'medium', 'large']))
0      small
1      small
2      small
3     medium
4     medium
5      small
6     medium
7      large
8     medium
9      large
10     large
11    medium
12     large
13     large
14    medium
15     small
16     small
17     small
18     small
dtype: category
Categories (3, object): [small < medium < large]

对于 DataFrame,使用 apply:

对每一列重复此操作
df.apply(pd.qcut, q=[0, .25, .75, 1], labels=['small', 'medium', 'large'], axis=0)

设置

np.random.seed([3, 1415])
df = pd.DataFrame(
    np.random.randint(10, size=(10, 3)),
    columns=list('ABC')
)

pandas.DataFrame.mask

Pandas仅直观

is_small = df < df.quantile(.25)
is_large = df > df.quantile(.75)
is_medium = ~(is_small | is_large)

df.mask(is_small, 'small').mask(is_large, 'large').mask(is_medium, 'medium')

        A       B       C
0   small   small  medium
1  medium   large  medium
2   small   large   large
3  medium   small   small
4   small  medium   large
5   large  medium   small
6  medium  medium  medium
7  medium   large  medium
8  medium  medium  medium
9   large  medium   large

嵌套numpy.where

is_small = df < df.quantile(.25)
is_large = df > df.quantile(.75)

pd.DataFrame(
    np.where(is_small, 'small', np.where(is_large, 'large', 'medium')),
    df.index, df.columns
)

        A       B       C
0   small   small  medium
1  medium   large  medium
2   small   large   large
3  medium   small   small
4   small  medium   large
5   large  medium   small
6  medium  medium  medium
7  medium   large  medium
8  medium  medium  medium
9   large  medium   large