添加基于 groupby 的 .75 分位数的列

Question

我有 df，索引为日期，还有名为分数的列。现在我想保持 df 不变，但添加列给出当天分数的 0.7 分位数。分位数的方法需要是中点，也需要四舍五入到最接近的整数。

Answer 1

我在下面概述了您可以采用的一种方法。

请注意，要将值四舍五入为最接近的整数，您应该使用 Python 的内置 round() 函数。有关详细信息，请参阅 Python documentation 中的 round()。

import pandas as pd
import numpy as np
# set random seed for reproducibility
np.random.seed(748)

# initialize base example dataframe
df = pd.DataFrame({"date":np.arange(10), 
                   "score":np.random.uniform(size=10)})

duplicate_dates = np.random.choice(df.index, 5)

df_dup = pd.DataFrame({"date":np.random.choice(df.index, 5), 
                       "score":np.random.uniform(size=5)})

# finish compiling example data
df = df.append(df_dup, ignore_index=True)

# calculate 0.7 quantile result with specified parameters
result = df.groupby("date").quantile(q=0.7, axis=0, interpolation='midpoint')

# print resulting dataframe
# contains one unique 0.7 quantile value per date
print(result)

"""
0.7      score
date          
0     0.585087
1     0.476404
2     0.426252
3     0.363376
4     0.165013
5     0.927199
6     0.575510
7     0.576636
8     0.831572
9     0.932183
"""

# to apply the resulting quantile information to 
# a new column in our original dataframe `df`
# we can apply a dictionary to our "date" column

# create dictionary
mapping = result.to_dict()["score"]

# apply to `df` to produce desired new column
df["quantile_0.7"] = [mapping[x] for x in df["date"]]

print(df)

"""
    date     score  quantile_0.7
0      0  0.920895      0.585087
1      1  0.476404      0.476404
2      2  0.380771      0.426252
3      3  0.363376      0.363376
4      4  0.165013      0.165013
5      5  0.927199      0.927199
6      6  0.340008      0.575510
7      7  0.695818      0.576636
8      8  0.831572      0.831572
9      9  0.932183      0.932183
10     7  0.457455      0.576636
11     6  0.650666      0.575510
12     6  0.500353      0.575510
13     0  0.249280      0.585087
14     2  0.471733      0.426252
"""

添加基于 groupby 的 .75 分位数的列

Add column of .75 quantile based off groupby

rounding

quantile

pandas

pandas-groupby