Python - 创建 2 个具有多个行值的第 25 个和第 75 个百分位数的新列
Python - To create 2 new column with 25th and 75th percentile of several row values
这是我的 df 的样子(有更多的行和更多的列):
Index
WTG1
WTG2
WTG3
1.5
61.25
-7.57
7.18
2
19.69
25.95
28.67
2.5
59.51
81.22
78.22
3
131.81
154.07
142.92
我的objective是得到:
Index
WTG1
WTG2
WTG3
25th Percentile
75th Percentile
Mean
1.5
61.25
-7.57
7.18
(25th Percentile of 61.2, -7.57, 7.18)
(75th Percentile of 61.2, -7.57, 7.18)
(Avg. of 61.2, -7.57, 7.18)
2
19.6
25.95
28.67
(25th Percentile of 19.69, 25.95, 28.67)
(75th Percentile of 19.69, 25.95, 28.67)
(AVG. of 19.69, 25.95, 28.67)
2.5
59.51
81.22
78.22
(25th Percentile of 59.51, 81.22, 78.22)
(75th Percentile of 59.51, 81.22, 78.22)
(AVG. of 59.51, 81.22, 78.22)
3
131.81
154.07
142.92
(25th Percentile of 131.81, 154.07, 142.92)
(75th Percentile of 131.81, 154.07, 142.92)
(AVG. of 131.81, 154.07, 142.92)
我已经找了很长时间了,尽我所能:
df['mean'] = df[['WTG1','WTG2','WTG3'].mean(axis=1)
df['25th Percentile'] = np.nanpercentile(df[['WTG1','WTG2','WTG3']],25)
df['75th Percentile'] = np.nanpercentile(df[['WTG1','WTG2','WTG3']],75)
平均值似乎有效,但尚未检查值。
但百分位数才是真正的问题...似乎 nanpercentile 函数仅适用于列。 returns 两个百分位数列的每一行都有相同的值(我猜这是各自的第 25 和第 75 个百分位值,但在整个 df 中),这不是我要做的。
我找到了一些替代品,但无法根据我的需要进行调整,例如:
perc75 = np.vectorize(lambda x: np.percentile(x, 75))
df['75th_percentile'] = perc75(df['WTG01'].values)
哪个有效,但仅适用于一列。
或
df['25th_percentile'] = df['WTG1','WTG2','WTG3'].apply(lambda x: np.percentile(x, 25))
这不起作用...
我想你可以转置 DataFrame 并应用 df.describe()
import pandas as pd
df = pd.DataFrame({'WTG1': [61.25, 19.69, 59.51, 131.81],
'WTG2': [-7.57, 25.95, 81.22, 154.07],
'WTG3': [7.18, 28.67, 78.22, 142.92]
})
print(df)
print(df.T)
输出
WTG1 WTG2 WTG3
0 61.25 -7.57 7.18
1 19.69 25.95 28.67
2 59.51 81.22 78.22
3 131.81 154.07 142.92
0 1 2 3
WTG1 61.25 19.69 59.51 131.81
WTG2 -7.57 25.95 81.22 154.07
WTG3 7.18 28.67 78.22 142.92
在问题中,您试图获取每一行的统计信息。转置 DataFrame 后,您可以改为查看列,这样您就可以方便地获得每一列的汇总统计信息
print(df.T.describe())
输出
0 1 2 3
count 3.000000 3.000000 3.000000 3.000000
mean 20.286667 24.770000 72.983333 142.933333
std 36.233778 4.604824 11.764269 11.130006
min -7.570000 19.690000 59.510000 131.810000
25% -0.195000 22.820000 68.865000 137.365000
50% 7.180000 25.950000 78.220000 142.920000
75% 34.215000 27.310000 79.720000 148.495000
max 61.250000 28.670000 81.220000 154.070000
这是我的 df 的样子(有更多的行和更多的列):
Index | WTG1 | WTG2 | WTG3 |
---|---|---|---|
1.5 | 61.25 | -7.57 | 7.18 |
2 | 19.69 | 25.95 | 28.67 |
2.5 | 59.51 | 81.22 | 78.22 |
3 | 131.81 | 154.07 | 142.92 |
我的objective是得到:
Index | WTG1 | WTG2 | WTG3 | 25th Percentile | 75th Percentile | Mean |
---|---|---|---|---|---|---|
1.5 | 61.25 | -7.57 | 7.18 | (25th Percentile of 61.2, -7.57, 7.18) | (75th Percentile of 61.2, -7.57, 7.18) | (Avg. of 61.2, -7.57, 7.18) |
2 | 19.6 | 25.95 | 28.67 | (25th Percentile of 19.69, 25.95, 28.67) | (75th Percentile of 19.69, 25.95, 28.67) | (AVG. of 19.69, 25.95, 28.67) |
2.5 | 59.51 | 81.22 | 78.22 | (25th Percentile of 59.51, 81.22, 78.22) | (75th Percentile of 59.51, 81.22, 78.22) | (AVG. of 59.51, 81.22, 78.22) |
3 | 131.81 | 154.07 | 142.92 | (25th Percentile of 131.81, 154.07, 142.92) | (75th Percentile of 131.81, 154.07, 142.92) | (AVG. of 131.81, 154.07, 142.92) |
我已经找了很长时间了,尽我所能:
df['mean'] = df[['WTG1','WTG2','WTG3'].mean(axis=1)
df['25th Percentile'] = np.nanpercentile(df[['WTG1','WTG2','WTG3']],25)
df['75th Percentile'] = np.nanpercentile(df[['WTG1','WTG2','WTG3']],75)
平均值似乎有效,但尚未检查值。
但百分位数才是真正的问题...似乎 nanpercentile 函数仅适用于列。 returns 两个百分位数列的每一行都有相同的值(我猜这是各自的第 25 和第 75 个百分位值,但在整个 df 中),这不是我要做的。
我找到了一些替代品,但无法根据我的需要进行调整,例如:
perc75 = np.vectorize(lambda x: np.percentile(x, 75))
df['75th_percentile'] = perc75(df['WTG01'].values)
哪个有效,但仅适用于一列。
或
df['25th_percentile'] = df['WTG1','WTG2','WTG3'].apply(lambda x: np.percentile(x, 25))
这不起作用...
我想你可以转置 DataFrame 并应用 df.describe()
import pandas as pd
df = pd.DataFrame({'WTG1': [61.25, 19.69, 59.51, 131.81],
'WTG2': [-7.57, 25.95, 81.22, 154.07],
'WTG3': [7.18, 28.67, 78.22, 142.92]
})
print(df)
print(df.T)
输出
WTG1 WTG2 WTG3
0 61.25 -7.57 7.18
1 19.69 25.95 28.67
2 59.51 81.22 78.22
3 131.81 154.07 142.92
0 1 2 3
WTG1 61.25 19.69 59.51 131.81
WTG2 -7.57 25.95 81.22 154.07
WTG3 7.18 28.67 78.22 142.92
在问题中,您试图获取每一行的统计信息。转置 DataFrame 后,您可以改为查看列,这样您就可以方便地获得每一列的汇总统计信息
print(df.T.describe())
输出
0 1 2 3
count 3.000000 3.000000 3.000000 3.000000
mean 20.286667 24.770000 72.983333 142.933333
std 36.233778 4.604824 11.764269 11.130006
min -7.570000 19.690000 59.510000 131.810000
25% -0.195000 22.820000 68.865000 137.365000
50% 7.180000 25.950000 78.220000 142.920000
75% 34.215000 27.310000 79.720000 148.495000
max 61.250000 28.670000 81.220000 154.070000