根据行值获取值和列名 - 具有范围的列
grab values and column names based on row values - columns with ranges
我有这个数据框
df = pd.DataFrame( {'R': {0: '01', 1: '02', 2: '03', 3: '04', 4: '05', 5: '06', 6: '07'}, 'name': {0: 'b', 1: 'm', 2: '', 3: '', 4: 'b', 5: 'mi,b,m,c', 6: 'mi,e,w,c'}, 'value': {0: ['5.01e-13'], 1: ['9.74e-32'], 2: np.nan, 3: np.nan, 4: ['8.58e-09'], 5: ['1.04e-01', '1.18e-01', '7.19e-08', '1.06e-01'], 6: ['2.64e-01', '3.05e-01', '1.77e-01', '2.28e-01']}, } )
产生:
R name value
0 01 b [5.01e-13]
1 02 m [9.74e-32]
2 03 NaN
3 04 NaN
4 05 b [8.58e-09]
5 06 mi,b,m,c [1.04e-01, 1.18e-01, 7.19e-08, 1.06e-01]
6 07 mi,e,w,c [2.64e-01, 3.05e-01, 1.77e-01, 2.28e-01]
我需要 2 个新列
df['name2']= displays name from df['name'] that has df['value'] < 0.05
df['value2']= displays value from df['value'] that is < 0.05
以下是所需的输出:
R name value name2 value2
0 01 b [5.01e-13] b [5.01e-13]
1 02 m [9.74e-32] m [9.74e-32]
2 03 NaN
3 04 NaN
4 05 b [8.58e-09] b [8.58e-09]
5 06 mi,b,m,c [1.04e-01, 1.18e-01, 7.19e-08, 1.06e-01] m [7.19e-08]
6 07 mi,e,w,c [2.64e-01, 3.05e-01, 1.77e-01, 2.28e-01]
我尝试了几个选项,例如
df['name2']=np.where[(df['value']<0.05), df['name'],'']
或由 产生的代码,但不幸的是它没有工作。
Pandas方法:
Split
, explode
然后过滤值为 < .05
的行,按 level=0
对过滤后的行进行分组,并使用 join
.
cols = ['name', 'value']
df1 = df[cols].assign(name=df['name'].str.split(',')).dropna().explode(cols)
df.join(df1[pd.to_numeric(df1['value']) < 0.05].groupby(level=0).agg(','.join).add_suffix('2'))
R name value name2 value2
0 01 b [5.01e-13] b 5.01e-13
1 02 m [9.74e-32] m 9.74e-32
2 03 NaN NaN NaN
3 04 NaN NaN NaN
4 05 b [8.58e-09] b 8.58e-09
5 06 mi,b,m,c [1.04e-01, 1.18e-01, 7.19e-08, 1.06e-01] m 7.19e-08
6 07 mi,e,w,c [2.64e-01, 3.05e-01, 1.77e-01, 2.28e-01] NaN NaN
注意:通常不建议在数据框中存储复杂数据类型(如列表、字典),除非您有非常充分的理由。这会严重影响性能。
首先,您需要通过拆分 ,
字符将 name
列从字符串转换为字符串数组。
df['name'] = df['name'].apply(lambda x: x.split(','))
现在您可以简单地应用另一个 lambda 函数来为 name2
列获得所需的输出。
def calc(x):
if x['value'] is np.nan:
return []
res = []
for i,v in enumerate(x['value']):
v = float(v)
if v < 0.05:
res.append(x['name'][i])
return res
df['name2'] = df.apply(lambda x: calc(x), axis=1)
print(df)
输出
R name value name2
0 01 [b] [5.01e-13] [b]
1 02 [m] [9.74e-32] [m]
2 03 [] NaN []
3 04 [] NaN []
4 05 [b] [8.58e-09] [b]
5 06 [mi, b, m, c] [1.04e-01, 1.18e-01, 7.19e-08, 1.06e-01] [m]
6 07 [mi, e, w, c] [2.64e-01, 3.05e-01, 1.77e-01, 2.28e-01] []
我有这个数据框
df = pd.DataFrame( {'R': {0: '01', 1: '02', 2: '03', 3: '04', 4: '05', 5: '06', 6: '07'}, 'name': {0: 'b', 1: 'm', 2: '', 3: '', 4: 'b', 5: 'mi,b,m,c', 6: 'mi,e,w,c'}, 'value': {0: ['5.01e-13'], 1: ['9.74e-32'], 2: np.nan, 3: np.nan, 4: ['8.58e-09'], 5: ['1.04e-01', '1.18e-01', '7.19e-08', '1.06e-01'], 6: ['2.64e-01', '3.05e-01', '1.77e-01', '2.28e-01']}, } )
产生:
R name value
0 01 b [5.01e-13]
1 02 m [9.74e-32]
2 03 NaN
3 04 NaN
4 05 b [8.58e-09]
5 06 mi,b,m,c [1.04e-01, 1.18e-01, 7.19e-08, 1.06e-01]
6 07 mi,e,w,c [2.64e-01, 3.05e-01, 1.77e-01, 2.28e-01]
我需要 2 个新列
df['name2']= displays name from df['name'] that has df['value'] < 0.05
df['value2']= displays value from df['value'] that is < 0.05
以下是所需的输出:
R name value name2 value2
0 01 b [5.01e-13] b [5.01e-13]
1 02 m [9.74e-32] m [9.74e-32]
2 03 NaN
3 04 NaN
4 05 b [8.58e-09] b [8.58e-09]
5 06 mi,b,m,c [1.04e-01, 1.18e-01, 7.19e-08, 1.06e-01] m [7.19e-08]
6 07 mi,e,w,c [2.64e-01, 3.05e-01, 1.77e-01, 2.28e-01]
我尝试了几个选项,例如
df['name2']=np.where[(df['value']<0.05), df['name'],'']
或由
Pandas方法:
Split
, explode
然后过滤值为 < .05
的行,按 level=0
对过滤后的行进行分组,并使用 join
.
cols = ['name', 'value']
df1 = df[cols].assign(name=df['name'].str.split(',')).dropna().explode(cols)
df.join(df1[pd.to_numeric(df1['value']) < 0.05].groupby(level=0).agg(','.join).add_suffix('2'))
R name value name2 value2
0 01 b [5.01e-13] b 5.01e-13
1 02 m [9.74e-32] m 9.74e-32
2 03 NaN NaN NaN
3 04 NaN NaN NaN
4 05 b [8.58e-09] b 8.58e-09
5 06 mi,b,m,c [1.04e-01, 1.18e-01, 7.19e-08, 1.06e-01] m 7.19e-08
6 07 mi,e,w,c [2.64e-01, 3.05e-01, 1.77e-01, 2.28e-01] NaN NaN
注意:通常不建议在数据框中存储复杂数据类型(如列表、字典),除非您有非常充分的理由。这会严重影响性能。
首先,您需要通过拆分 ,
字符将 name
列从字符串转换为字符串数组。
df['name'] = df['name'].apply(lambda x: x.split(','))
现在您可以简单地应用另一个 lambda 函数来为 name2
列获得所需的输出。
def calc(x):
if x['value'] is np.nan:
return []
res = []
for i,v in enumerate(x['value']):
v = float(v)
if v < 0.05:
res.append(x['name'][i])
return res
df['name2'] = df.apply(lambda x: calc(x), axis=1)
print(df)
输出
R name value name2
0 01 [b] [5.01e-13] [b]
1 02 [m] [9.74e-32] [m]
2 03 [] NaN []
3 04 [] NaN []
4 05 [b] [8.58e-09] [b]
5 06 [mi, b, m, c] [1.04e-01, 1.18e-01, 7.19e-08, 1.06e-01] [m]
6 07 [mi, e, w, c] [2.64e-01, 3.05e-01, 1.77e-01, 2.28e-01] []