根据行值(单元格中的多个值)获取值和列名
grab values and column names based on row values (multiple values in cell)
我有这个df
df = pd.DataFrame( {'R': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'},\
'a': {0: 1.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 3.0, 5: 2.0, 6: 3.0},\
'b': {0: 1.0, 1: 1.0, 2: 1.0, 3: 2.0, 4: 2.0, 5: 0.0, 6: 3.0},\
'c': {0: 1.0, 1: 2.0, 2: 2.0, 3: 2.0, 4: 2.0, 5: -2.0, 6: -2.0}, \
'd': {0: 1.0, 1: 2.0, 2: 1.0, 3: 0.0, 4: 1.0, 5: 2.0, 6: -1.0},\
'e': {0: 1.0, 1: 2.0, 2: 2.0, 3: 1.0, 4: 1.0, 5: 2.0, 6: -2.0}, \
'f': {0: -1.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: -2.0, 5: -1.0, 6: 2.0},\
'g': {0: 1.0, 1: 1.0, 2: 2.0, 3: 1.5, 4: 2.0, 5: 0.0, 6: 2.0}, \
'h': {0: 0.0, 1: 0.0, 2: 1.0, 3: 2.0, 4: 2.0, 5: 1.0, 6: 3.0}, \
'i': {0: 0.0, 1: -1.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: -3.0, 6: 3.0}, \
'j': {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 2.0, 5: -1.0, 6: -1.0}, \
'k': {0: 62, 1: 166, 2: 139, 3: 60, 4: 93, 5: 17, 6: 5}} )
这给了我们
R a b c d e f g h i j k
0 1 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 0.0 0.0 1.0 62
1 2 1.0 1.0 2.0 2.0 2.0 0.0 1.0 0.0 -1.0 1.0 166
2 3 2.0 1.0 2.0 1.0 2.0 0.0 2.0 1.0 0.0 1.0 139
3 4 3.0 2.0 2.0 0.0 1.0 0.0 1.5 2.0 0.0 1.0 60
4 5 3.0 2.0 2.0 1.0 1.0 -2.0 2.0 2.0 0.0 2.0 93
5 6 2.0 0.0 -2.0 2.0 2.0 -1.0 0.0 1.0 -3.0 -1.0 17
6 7 3.0 3.0 -2.0 -1.0 -2.0 2.0 2.0 3.0 3.0 -1.0 5
我需要 2 个新列
df['an']= 显示每个列的 列名称 ,其中当前原始值为负值
df['nv']= 显示每列的 负值 ,其中当前原始值为负值
期望的输出
R a b c d e f g h i j k an nv
0 1 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 0.0 0.0 1.0 62 'f' -1
1 2 1.0 1.0 2.0 2.0 2.0 0.0 1.0 0.0 -1.0 1.0 166 'i' -1
2 3 2.0 1.0 2.0 1.0 2.0 0.0 2.0 1.0 0.0 1.0 139 '-' -
3 4 3.0 2.0 2.0 0.0 1.0 0.0 1.5 2.0 0.0 1.0 60 '-' -
4 5 3.0 2.0 2.0 1.0 1.0 -2.0 2.0 2.0 0.0 2.0 93 'f' -2
5 6 2.0 0.0 -2.0 2.0 2.0 -1.0 0.0 1.0 -3.0 -1.0 17 'c,f,i,j' [-2,-1,-3,-1]
6 7 3.0 3.0 -2.0 -1.0 -2.0 2.0 2.0 3.0 3.0 -1.0 5 'c,d,e,j' [-2,-1,-2,-1]
我尝试了多个代码选项,例如 np.where 或 np.select,但我无法让它工作。
非常感谢任何帮助。
您可以对每行使用比较和布尔索引,使用赋值表达式保存中间变量,并创建一个系列:
df.join(df.drop(columns='R')
.apply(lambda s: pd.Series({'an': ','.join((S:=s[s.lt(0)]).index),
'nv': list(S)}), axis=1)
)
或使用自定义函数:
def f(s):
S = s[s.lt(0)]
return pd.Series({'an': ','.join(S.index),
'nv': list(S)})
df.join(df.drop(columns='R').apply(f, axis=1))
输出:
R a b c d e f g h i j k an nv
0 1 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 0.0 0.0 1.0 62 f [-1.0]
1 2 1.0 1.0 2.0 2.0 2.0 0.0 1.0 0.0 -1.0 1.0 166 i [-1.0]
2 3 2.0 1.0 2.0 1.0 2.0 0.0 2.0 1.0 0.0 1.0 139 []
3 4 3.0 2.0 2.0 0.0 1.0 0.0 1.5 2.0 0.0 1.0 60 []
4 5 3.0 2.0 2.0 1.0 1.0 -2.0 2.0 2.0 0.0 2.0 93 f [-2.0]
5 6 2.0 0.0 -2.0 2.0 2.0 -1.0 0.0 1.0 -3.0 -1.0 17 c,f,i,j [-2.0, -1.0, -3.0, -1.0]
6 7 3.0 3.0 -2.0 -1.0 -2.0 2.0 2.0 3.0 3.0 -1.0 5 c,d,e,j [-2.0, -1.0, -2.0, -1.0]
我有这个df
df = pd.DataFrame( {'R': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'},\
'a': {0: 1.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 3.0, 5: 2.0, 6: 3.0},\
'b': {0: 1.0, 1: 1.0, 2: 1.0, 3: 2.0, 4: 2.0, 5: 0.0, 6: 3.0},\
'c': {0: 1.0, 1: 2.0, 2: 2.0, 3: 2.0, 4: 2.0, 5: -2.0, 6: -2.0}, \
'd': {0: 1.0, 1: 2.0, 2: 1.0, 3: 0.0, 4: 1.0, 5: 2.0, 6: -1.0},\
'e': {0: 1.0, 1: 2.0, 2: 2.0, 3: 1.0, 4: 1.0, 5: 2.0, 6: -2.0}, \
'f': {0: -1.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: -2.0, 5: -1.0, 6: 2.0},\
'g': {0: 1.0, 1: 1.0, 2: 2.0, 3: 1.5, 4: 2.0, 5: 0.0, 6: 2.0}, \
'h': {0: 0.0, 1: 0.0, 2: 1.0, 3: 2.0, 4: 2.0, 5: 1.0, 6: 3.0}, \
'i': {0: 0.0, 1: -1.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: -3.0, 6: 3.0}, \
'j': {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 2.0, 5: -1.0, 6: -1.0}, \
'k': {0: 62, 1: 166, 2: 139, 3: 60, 4: 93, 5: 17, 6: 5}} )
这给了我们
R a b c d e f g h i j k
0 1 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 0.0 0.0 1.0 62
1 2 1.0 1.0 2.0 2.0 2.0 0.0 1.0 0.0 -1.0 1.0 166
2 3 2.0 1.0 2.0 1.0 2.0 0.0 2.0 1.0 0.0 1.0 139
3 4 3.0 2.0 2.0 0.0 1.0 0.0 1.5 2.0 0.0 1.0 60
4 5 3.0 2.0 2.0 1.0 1.0 -2.0 2.0 2.0 0.0 2.0 93
5 6 2.0 0.0 -2.0 2.0 2.0 -1.0 0.0 1.0 -3.0 -1.0 17
6 7 3.0 3.0 -2.0 -1.0 -2.0 2.0 2.0 3.0 3.0 -1.0 5
我需要 2 个新列
df['an']= 显示每个列的 列名称 ,其中当前原始值为负值
df['nv']= 显示每列的 负值 ,其中当前原始值为负值
期望的输出
R a b c d e f g h i j k an nv
0 1 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 0.0 0.0 1.0 62 'f' -1
1 2 1.0 1.0 2.0 2.0 2.0 0.0 1.0 0.0 -1.0 1.0 166 'i' -1
2 3 2.0 1.0 2.0 1.0 2.0 0.0 2.0 1.0 0.0 1.0 139 '-' -
3 4 3.0 2.0 2.0 0.0 1.0 0.0 1.5 2.0 0.0 1.0 60 '-' -
4 5 3.0 2.0 2.0 1.0 1.0 -2.0 2.0 2.0 0.0 2.0 93 'f' -2
5 6 2.0 0.0 -2.0 2.0 2.0 -1.0 0.0 1.0 -3.0 -1.0 17 'c,f,i,j' [-2,-1,-3,-1]
6 7 3.0 3.0 -2.0 -1.0 -2.0 2.0 2.0 3.0 3.0 -1.0 5 'c,d,e,j' [-2,-1,-2,-1]
我尝试了多个代码选项,例如 np.where 或 np.select,但我无法让它工作。
非常感谢任何帮助。
您可以对每行使用比较和布尔索引,使用赋值表达式保存中间变量,并创建一个系列:
df.join(df.drop(columns='R')
.apply(lambda s: pd.Series({'an': ','.join((S:=s[s.lt(0)]).index),
'nv': list(S)}), axis=1)
)
或使用自定义函数:
def f(s):
S = s[s.lt(0)]
return pd.Series({'an': ','.join(S.index),
'nv': list(S)})
df.join(df.drop(columns='R').apply(f, axis=1))
输出:
R a b c d e f g h i j k an nv
0 1 1.0 1.0 1.0 1.0 1.0 -1.0 1.0 0.0 0.0 1.0 62 f [-1.0]
1 2 1.0 1.0 2.0 2.0 2.0 0.0 1.0 0.0 -1.0 1.0 166 i [-1.0]
2 3 2.0 1.0 2.0 1.0 2.0 0.0 2.0 1.0 0.0 1.0 139 []
3 4 3.0 2.0 2.0 0.0 1.0 0.0 1.5 2.0 0.0 1.0 60 []
4 5 3.0 2.0 2.0 1.0 1.0 -2.0 2.0 2.0 0.0 2.0 93 f [-2.0]
5 6 2.0 0.0 -2.0 2.0 2.0 -1.0 0.0 1.0 -3.0 -1.0 17 c,f,i,j [-2.0, -1.0, -3.0, -1.0]
6 7 3.0 3.0 -2.0 -1.0 -2.0 2.0 2.0 3.0 3.0 -1.0 5 c,d,e,j [-2.0, -1.0, -2.0, -1.0]