Pandas:如何使用大于和小于分位数分配新的 DF 值?
Pandas: How assign to a new DF values in quantiles, using greater than and smaller than?
我是编码新手,我的英语不是很好所以请耐心等待我=D
这是主DF(df_mcred_pf
)。我在下面完整发布了所有数据和代码。
我从主 DF 创建了一个 DF,其中包含第一个分位数的所有值并且它起作用了:
df_mcred_pf_Q1 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100)]
df_mcred_pf_Q1.head(30)
现在我需要用第二个分位数的值创建一个新的 DF:所有大于 1sq 分位数 (vQ1_mcred_pf
) 和小于第二个分位数 (vQ2_mcred_pf
).我试过了,但没用:
df_mcred_pf_Q2 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']>np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100) & df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ2_mcred_pf/100)]
我收到这个错误:TypeError: Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]
我被困在这里。你能帮帮我吗?
完整代码在这里:
import pandas as pd
import numpy as np
df_mcred_pf = pd.DataFrame([[2, 12, "F", 1, 1, 12.55, 437],
[2, 12, "F", 1, 1, 17.81, 437],
[2, 12, "F", 1, 1, 18.14, 437],
[2, 12, "F", 1, 1, 20.43, 437],
[2, 12, "F", 1, 1, 21.19, 437],
[2, 12, "F", 1, 1, 22.73, 437],
[2, 12, "F", 1, 1, 23.73, 437],
[2, 12, "F", 1, 1, 25.26, 437],
[2, 12, "F", 1, 1, 25.34, 437],
[2, 12, "F", 1, 1, 26.02, 437],
[2, 12, "F", 1, 1, 26.78, 437],
[2, 12, "F", 1, 1, 26.79, 437],
[2, 12, "F", 1, 1, 26.83, 437],
[2, 12, "F", 1, 1, 27.59, 437],
[2, 12, "F", 1, 1, 27.83, 437],
[2, 12, "F", 1, 1, 28.32, 437],
[2, 12, "F", 1, 1, 28.32, 437],
[2, 12, "F", 1, 1, 28.83, 437],
[2, 12, "F", 1, 1, 29.08, 437],
[2, 12, "F", 1, 1, 29.13, 437],
[2, 12, "F", 1, 1, 29.33, 437],
[2, 12, "F", 1, 1, 29.84, 437],
[2, 12, "F", 1, 1, 29.85, 437],
[2, 12, "F", 1, 1, 30.36, 437],
[2, 12, "F", 1, 1, 30.62, 437],
[2, 12, "F", 1, 1, 30.87, 437],
[2, 12, "F", 1, 1, 31.38, 437],
[2, 12, "F", 1, 1, 31.39, 437],
[2, 12, "F", 1, 1, 31.89, 437],
[2, 12, "F", 1, 1, 32.92, 437]], columns=['cd_mod_pri', 'cd_mod_sec', 'id_tp_pes', 'cd_idx_pri', 'cd_idx_sec', 'vr_tx_jrs', 'quantidade'])
MAX_mcred = df_mcred_pf['vr_tx_jrs'].max()
MIN_mcred = df_mcred_pf['vr_tx_jrs'].min()
vQ1_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(0.25)
vQ2_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(0.50)
vQ3_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(0.75)
vQ4_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(1.00)
df_mcred_pf_Q1 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100)]
df_mcred_pf_Q1.head(30)
MEDIAN_mcred = df_mcred_pf_Q1["vr_tx_jrs"].median()
df_mcred_pf_Q2 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']>np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100) & df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ2_mcred_pf/100)]
我会以不同的方式解决这个问题,并创建一个带有分位数描述符的列:
import pandas as pd
import numpy as np
#your dataframe here
quant = [0, .25, .5, .75, 1]
s = df_mcred_pf["vr_tx_jrs"].quantile(quant)
df_mcred_pf["Quartil"] = pd.cut(df_mcred_pf["vr_tx_jrs"], s, include_lowest=True, labels=["Q1", "Q2", "Q3", "Q4"])
这个returns输出如下:
cd_mod_pri cd_mod_sec id_tp_pes ... vr_tx_jrs quantidade Quartil
0 2 12 F ... 12.55 437 Q1
1 2 12 F ... 17.81 437 Q1
2 2 12 F ... 18.14 437 Q1
3 2 12 F ... 20.43 437 Q1
4 2 12 F ... 21.19 437 Q1
5 2 12 F ... 22.73 437 Q1
6 2 12 F ... 23.73 437 Q1
7 2 12 F ... 25.26 437 Q1
8 2 12 F ... 25.34 437 Q2
9 2 12 F ... 26.02 437 Q2
10 2 12 F ... 26.78 437 Q2
...
28 2 12 F ... 31.89 437 Q4
29 2 12 F ... 32.92 437 Q4
[30 rows x 8 columns]
现在,您可以按四分位数过滤数据帧:
print(df_mcred_pf[df_mcred_pf["Quartil"]=="Q2"])
您也可以选择将四分位数编码为数字,例如
labels=range(len(quant)-1)
然后,您可以获得高达 0.75 的四分位数
print(df_mcred_pf[df_mcred_pf["Quartil"]<3])
也许有更简单的方法来实现这一点,让我们看看其他人会想出什么。
我是编码新手,我的英语不是很好所以请耐心等待我=D
这是主DF(df_mcred_pf
)。我在下面完整发布了所有数据和代码。
我从主 DF 创建了一个 DF,其中包含第一个分位数的所有值并且它起作用了:
df_mcred_pf_Q1 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100)]
df_mcred_pf_Q1.head(30)
现在我需要用第二个分位数的值创建一个新的 DF:所有大于 1sq 分位数 (vQ1_mcred_pf
) 和小于第二个分位数 (vQ2_mcred_pf
).我试过了,但没用:
df_mcred_pf_Q2 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']>np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100) & df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ2_mcred_pf/100)]
我收到这个错误:TypeError: Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]
我被困在这里。你能帮帮我吗?
完整代码在这里:
import pandas as pd
import numpy as np
df_mcred_pf = pd.DataFrame([[2, 12, "F", 1, 1, 12.55, 437],
[2, 12, "F", 1, 1, 17.81, 437],
[2, 12, "F", 1, 1, 18.14, 437],
[2, 12, "F", 1, 1, 20.43, 437],
[2, 12, "F", 1, 1, 21.19, 437],
[2, 12, "F", 1, 1, 22.73, 437],
[2, 12, "F", 1, 1, 23.73, 437],
[2, 12, "F", 1, 1, 25.26, 437],
[2, 12, "F", 1, 1, 25.34, 437],
[2, 12, "F", 1, 1, 26.02, 437],
[2, 12, "F", 1, 1, 26.78, 437],
[2, 12, "F", 1, 1, 26.79, 437],
[2, 12, "F", 1, 1, 26.83, 437],
[2, 12, "F", 1, 1, 27.59, 437],
[2, 12, "F", 1, 1, 27.83, 437],
[2, 12, "F", 1, 1, 28.32, 437],
[2, 12, "F", 1, 1, 28.32, 437],
[2, 12, "F", 1, 1, 28.83, 437],
[2, 12, "F", 1, 1, 29.08, 437],
[2, 12, "F", 1, 1, 29.13, 437],
[2, 12, "F", 1, 1, 29.33, 437],
[2, 12, "F", 1, 1, 29.84, 437],
[2, 12, "F", 1, 1, 29.85, 437],
[2, 12, "F", 1, 1, 30.36, 437],
[2, 12, "F", 1, 1, 30.62, 437],
[2, 12, "F", 1, 1, 30.87, 437],
[2, 12, "F", 1, 1, 31.38, 437],
[2, 12, "F", 1, 1, 31.39, 437],
[2, 12, "F", 1, 1, 31.89, 437],
[2, 12, "F", 1, 1, 32.92, 437]], columns=['cd_mod_pri', 'cd_mod_sec', 'id_tp_pes', 'cd_idx_pri', 'cd_idx_sec', 'vr_tx_jrs', 'quantidade'])
MAX_mcred = df_mcred_pf['vr_tx_jrs'].max()
MIN_mcred = df_mcred_pf['vr_tx_jrs'].min()
vQ1_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(0.25)
vQ2_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(0.50)
vQ3_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(0.75)
vQ4_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(1.00)
df_mcred_pf_Q1 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100)]
df_mcred_pf_Q1.head(30)
MEDIAN_mcred = df_mcred_pf_Q1["vr_tx_jrs"].median()
df_mcred_pf_Q2 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']>np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100) & df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ2_mcred_pf/100)]
我会以不同的方式解决这个问题,并创建一个带有分位数描述符的列:
import pandas as pd
import numpy as np
#your dataframe here
quant = [0, .25, .5, .75, 1]
s = df_mcred_pf["vr_tx_jrs"].quantile(quant)
df_mcred_pf["Quartil"] = pd.cut(df_mcred_pf["vr_tx_jrs"], s, include_lowest=True, labels=["Q1", "Q2", "Q3", "Q4"])
这个returns输出如下:
cd_mod_pri cd_mod_sec id_tp_pes ... vr_tx_jrs quantidade Quartil
0 2 12 F ... 12.55 437 Q1
1 2 12 F ... 17.81 437 Q1
2 2 12 F ... 18.14 437 Q1
3 2 12 F ... 20.43 437 Q1
4 2 12 F ... 21.19 437 Q1
5 2 12 F ... 22.73 437 Q1
6 2 12 F ... 23.73 437 Q1
7 2 12 F ... 25.26 437 Q1
8 2 12 F ... 25.34 437 Q2
9 2 12 F ... 26.02 437 Q2
10 2 12 F ... 26.78 437 Q2
...
28 2 12 F ... 31.89 437 Q4
29 2 12 F ... 32.92 437 Q4
[30 rows x 8 columns]
现在,您可以按四分位数过滤数据帧:
print(df_mcred_pf[df_mcred_pf["Quartil"]=="Q2"])
您也可以选择将四分位数编码为数字,例如
labels=range(len(quant)-1)
然后,您可以获得高达 0.75 的四分位数
print(df_mcred_pf[df_mcred_pf["Quartil"]<3])
也许有更简单的方法来实现这一点,让我们看看其他人会想出什么。