在 pandas 中添加分位数作为新列
Add quantile number as a new column in pandas
我有一个包含三列的数据框
|一个 |乙 | C |
我计算了分位数:
df.quantile(.25)
df.quantile(.75)
我想添加一个新列 Q
根据简单规则使用 'small', 'medium', 'large'
进行分类。如果值小于 1 个四分位数,则较小;如果大于 3 个四分位数,则为大,介于两者之间的为中等。
我试过使用 qcut,但它只接收一维输入。
谢谢
pd.qcut
是你的朋友。
pd.qcut(s, q=[0, .25, .75, 1], labels=['small', 'medium', 'large'])
MWE
print(s)
0 1
1 1
2 2
3 3
4 4
5 2
6 4
7 6
8 4
9 6
10 5
11 4
12 6
13 7
14 3
15 2
16 1
17 1
18 2
dtype: int64
print (pd.qcut(s, q=[0, .25, .75, 1], labels=['small', 'medium', 'large']))
0 small
1 small
2 small
3 medium
4 medium
5 small
6 medium
7 large
8 medium
9 large
10 large
11 medium
12 large
13 large
14 medium
15 small
16 small
17 small
18 small
dtype: category
Categories (3, object): [small < medium < large]
对于 DataFrame,使用 apply
:
对每一列重复此操作
df.apply(pd.qcut, q=[0, .25, .75, 1], labels=['small', 'medium', 'large'], axis=0)
设置
np.random.seed([3, 1415])
df = pd.DataFrame(
np.random.randint(10, size=(10, 3)),
columns=list('ABC')
)
pandas.DataFrame.mask
Pandas仅直观
is_small = df < df.quantile(.25)
is_large = df > df.quantile(.75)
is_medium = ~(is_small | is_large)
df.mask(is_small, 'small').mask(is_large, 'large').mask(is_medium, 'medium')
A B C
0 small small medium
1 medium large medium
2 small large large
3 medium small small
4 small medium large
5 large medium small
6 medium medium medium
7 medium large medium
8 medium medium medium
9 large medium large
嵌套numpy.where
is_small = df < df.quantile(.25)
is_large = df > df.quantile(.75)
pd.DataFrame(
np.where(is_small, 'small', np.where(is_large, 'large', 'medium')),
df.index, df.columns
)
A B C
0 small small medium
1 medium large medium
2 small large large
3 medium small small
4 small medium large
5 large medium small
6 medium medium medium
7 medium large medium
8 medium medium medium
9 large medium large
我有一个包含三列的数据框
|一个 |乙 | C |
我计算了分位数:
df.quantile(.25)
df.quantile(.75)
我想添加一个新列 Q
根据简单规则使用 'small', 'medium', 'large'
进行分类。如果值小于 1 个四分位数,则较小;如果大于 3 个四分位数,则为大,介于两者之间的为中等。
我试过使用 qcut,但它只接收一维输入。
谢谢
pd.qcut
是你的朋友。
pd.qcut(s, q=[0, .25, .75, 1], labels=['small', 'medium', 'large'])
MWE
print(s)
0 1
1 1
2 2
3 3
4 4
5 2
6 4
7 6
8 4
9 6
10 5
11 4
12 6
13 7
14 3
15 2
16 1
17 1
18 2
dtype: int64
print (pd.qcut(s, q=[0, .25, .75, 1], labels=['small', 'medium', 'large']))
0 small
1 small
2 small
3 medium
4 medium
5 small
6 medium
7 large
8 medium
9 large
10 large
11 medium
12 large
13 large
14 medium
15 small
16 small
17 small
18 small
dtype: category
Categories (3, object): [small < medium < large]
对于 DataFrame,使用 apply
:
df.apply(pd.qcut, q=[0, .25, .75, 1], labels=['small', 'medium', 'large'], axis=0)
设置
np.random.seed([3, 1415])
df = pd.DataFrame(
np.random.randint(10, size=(10, 3)),
columns=list('ABC')
)
pandas.DataFrame.mask
Pandas仅直观
is_small = df < df.quantile(.25)
is_large = df > df.quantile(.75)
is_medium = ~(is_small | is_large)
df.mask(is_small, 'small').mask(is_large, 'large').mask(is_medium, 'medium')
A B C
0 small small medium
1 medium large medium
2 small large large
3 medium small small
4 small medium large
5 large medium small
6 medium medium medium
7 medium large medium
8 medium medium medium
9 large medium large
嵌套numpy.where
is_small = df < df.quantile(.25)
is_large = df > df.quantile(.75)
pd.DataFrame(
np.where(is_small, 'small', np.where(is_large, 'large', 'medium')),
df.index, df.columns
)
A B C
0 small small medium
1 medium large medium
2 small large large
3 medium small small
4 small medium large
5 large medium small
6 medium medium medium
7 medium large medium
8 medium medium medium
9 large medium large