如何更改 Panda Dataframe,使值严格介于 0 和 1 之间,同时保留 NaN?
How can I change a Panda Dataframe so that the values are strictly between 0 and 1 while preserving the NaNs?
我正在研究 Kaggle Titanic 问题。我有一个功能,可以根据乘客的特征创建生存方式的交叉产品。对于 Embarked 的 SibSp,我得到以下生存率 table:
import pandas as pd
import numpy as np
data = [[0.5,0.657,0.75, np.NaN, np.NaN, np.NaN, np.NaN,0.556],
[0.372,0.375,0.667, np.NaN,0, np.NaN, np.NaN,0.362],
[0.302,0.438,0.375,0.364,0.3,0,0,0.336],
[0.343,0.506,0.478,0.364,0.214,0,0,0.377]]
df_m = pd.DataFrame(data, columns=[0,1,2,3,4,5,8,'All'],
index = ['C', 'Q', 'A', 'All'])
所以我开始的转置是:
Embarked C Q S All
SibSp
0 0.500000 0.372093 0.302115 0.342920
1 0.657143 0.375000 0.468468 0.506494
2 0.750000 0.666667 0.375000 0.478261
3 NaN NaN 0.363636 0.363636
4 NaN 0.000000 0.300000 0.214286
5 NaN NaN 0.000000 0.000000
8 NaN NaN 0.000000 0.000000
All 0.555556 0.362069 0.336049 0.376877
虽然我想要的端点是这样的:
Embarked C Q S All
SibSp
0 0.500000 0.372093 0.302115 0.342920
1 0.657143 0.375000 0.468468 0.506494
2 0.750000 0.666667 0.375000 0.478261
3 NaN NaN 0.363636 0.363636
4 NaN 0.000100 0.300000 0.214286
5 NaN NaN 0.000100 0.000100
8 NaN NaN 0.000100 0.000100
All 0.555556 0.362069 0.336049 0.376877
我想将比率严格限制在 0 到 1 之间,同时保留 NaN。我已经尝试了两种循环方式:
for i in df_m.columns:
for j in df_m.index:
p_hat.at[i, j] = max(min(df_m[i, j], 0.999), 0.001)
并将最后一行中的“.at”替换为“.loc”。这两种方法都从第一列和索引中抛出 KeyError: (0, 'C').
我采用的另一种方法是连接并采用 max(value, .001) 和 min(value, .999):
smalls = pd.DataFrame(0.001*np.ones(df_m.shape))
bigs = pd.DataFrame(0.999*np.ones(df_m.shape))
smalls.columns = df_m.columns
bigs.columns = df_m.columns
smalls.index = df_m.index
bigs.index = df_m.index
p_hat1 = pd.concat([df_m, bigs]).groupby(level=0).min()
p_hat = pd.concat([p_hat1, smalls]).groupby(level=0).max()
这具有将 NaN 转换为 0.999 的副作用。
在稍后的步骤中,我想结合比率和计数并计算 95% 的置信区间以进行绘图。在那个阶段,我不想显示 NaN。
提前致谢。
尝试:
df_m[df_m.eq(0)] = 0.0001
print(df_m.T)
# Output
C Q A All
0 0.500 0.3720 0.3020 0.3430
1 0.657 0.3750 0.4380 0.5060
2 0.750 0.6670 0.3750 0.4780
3 NaN NaN 0.3640 0.3640
4 NaN 0.0001 0.3000 0.2140
5 NaN NaN 0.0001 0.0001
8 NaN NaN 0.0001 0.0001
All 0.556 0.3620 0.3360 0.3770
更新
It doesn't show in this example but I also replace values of 1.0 with 0.999
更喜欢clip
df_m = df_m.clip(lower=0.001, upper=0.999)
我正在研究 Kaggle Titanic 问题。我有一个功能,可以根据乘客的特征创建生存方式的交叉产品。对于 Embarked 的 SibSp,我得到以下生存率 table:
import pandas as pd
import numpy as np
data = [[0.5,0.657,0.75, np.NaN, np.NaN, np.NaN, np.NaN,0.556],
[0.372,0.375,0.667, np.NaN,0, np.NaN, np.NaN,0.362],
[0.302,0.438,0.375,0.364,0.3,0,0,0.336],
[0.343,0.506,0.478,0.364,0.214,0,0,0.377]]
df_m = pd.DataFrame(data, columns=[0,1,2,3,4,5,8,'All'],
index = ['C', 'Q', 'A', 'All'])
所以我开始的转置是:
Embarked C Q S All
SibSp
0 0.500000 0.372093 0.302115 0.342920
1 0.657143 0.375000 0.468468 0.506494
2 0.750000 0.666667 0.375000 0.478261
3 NaN NaN 0.363636 0.363636
4 NaN 0.000000 0.300000 0.214286
5 NaN NaN 0.000000 0.000000
8 NaN NaN 0.000000 0.000000
All 0.555556 0.362069 0.336049 0.376877
虽然我想要的端点是这样的:
Embarked C Q S All
SibSp
0 0.500000 0.372093 0.302115 0.342920
1 0.657143 0.375000 0.468468 0.506494
2 0.750000 0.666667 0.375000 0.478261
3 NaN NaN 0.363636 0.363636
4 NaN 0.000100 0.300000 0.214286
5 NaN NaN 0.000100 0.000100
8 NaN NaN 0.000100 0.000100
All 0.555556 0.362069 0.336049 0.376877
我想将比率严格限制在 0 到 1 之间,同时保留 NaN。我已经尝试了两种循环方式:
for i in df_m.columns:
for j in df_m.index:
p_hat.at[i, j] = max(min(df_m[i, j], 0.999), 0.001)
并将最后一行中的“.at”替换为“.loc”。这两种方法都从第一列和索引中抛出 KeyError: (0, 'C').
我采用的另一种方法是连接并采用 max(value, .001) 和 min(value, .999):
smalls = pd.DataFrame(0.001*np.ones(df_m.shape))
bigs = pd.DataFrame(0.999*np.ones(df_m.shape))
smalls.columns = df_m.columns
bigs.columns = df_m.columns
smalls.index = df_m.index
bigs.index = df_m.index
p_hat1 = pd.concat([df_m, bigs]).groupby(level=0).min()
p_hat = pd.concat([p_hat1, smalls]).groupby(level=0).max()
这具有将 NaN 转换为 0.999 的副作用。 在稍后的步骤中,我想结合比率和计数并计算 95% 的置信区间以进行绘图。在那个阶段,我不想显示 NaN。
提前致谢。
尝试:
df_m[df_m.eq(0)] = 0.0001
print(df_m.T)
# Output
C Q A All
0 0.500 0.3720 0.3020 0.3430
1 0.657 0.3750 0.4380 0.5060
2 0.750 0.6670 0.3750 0.4780
3 NaN NaN 0.3640 0.3640
4 NaN 0.0001 0.3000 0.2140
5 NaN NaN 0.0001 0.0001
8 NaN NaN 0.0001 0.0001
All 0.556 0.3620 0.3360 0.3770
更新
It doesn't show in this example but I also replace values of 1.0 with 0.999
更喜欢clip
df_m = df_m.clip(lower=0.001, upper=0.999)