pandas link 列使用列表项
pandas link column using list item
我有两个数据帧 df
和 tf
如下所示
df = [{"unique_key": 1, "test_ids": "1.0,15,2.0,nan"}, {"unique_key": 2, "test_ids": "51,75.0,11.0,NaN"},{"unique_key": 3, "test_ids":np.nan},
{"unique_key": 4, "test_ids":np.nan}]
df = pd.DataFrame(df)
test_ids,status,revenue,cnt_days
1,passed,234.54,3
2,passed,543.21,5
11,failed,21.3,4
15,failed,2098.21,6
51,passed,232,21
75,failed,123.87,32
tf = pd.read_clipboard(sep=',')
我想 link 从 df
到 tf
数据框的 unique_key
列
例如:我将在下面显示我的输出(比文本更容易理解)
我正在尝试类似下面的操作
for b in df.test_ids.tolist():
for a in b.split(','):
if a >= 0: # to exclude NA values from checking
for i in len(test_ids):
if int(a) == tf['test_ids'][i]:
tf['unique_key'] = df['unique_key']
但这对于解决我的问题既不高效也不优雅。
是否有其他更好的方法来实现如下所示的预期输出?
您可以创建 Series
并删除重复项和缺失值,交换到 dictioanry 并为新的第一列使用 DataFrame.insert
with Series.map
:
s = (df.set_index('unique_key')['test_ids']
.str.split(',')
.explode()
.astype(float)
.dropna()
.astype(int)
.drop_duplicates()
d = {v: k for k, v in s.items()}
print (d)
{1: 1, 15: 1, 2: 1, 51: 2, 75: 2, 11: 2}
tf.insert(0, 'unique_key', tf['test_ids'].map(d))
print (tf)
unique_key test_ids status revenue cnt_days
0 1 1 passed 234.54 3
1 1 2 passed 543.21 5
2 2 11 failed 21.30 4
3 1 15 failed 2098.21 6
4 2 51 passed 232.00 21
5 2 75 failed 123.87 32
另一个想法是使用 DataFrame
并创建 Series
用于映射:
s = (df.assign(new = df['test_ids'].str.split(','))
.explode('new')
.astype({'new':float})
.dropna(subset=['new'])
.astype({'new':int})
.drop_duplicates(subset=['new'])
.set_index('new')['unique_key'])
print (s)
new
1 1
15 1
2 1
51 2
75 2
11 2
Name: unique_key, dtype: int64
tf.insert(0, 'unique_key', tf['test_ids'].map(s))
print (tf)
unique_key test_ids status revenue cnt_days
0 1 1 passed 234.54 3
1 1 2 passed 543.21 5
2 2 11 failed 21.30 4
3 1 15 failed 2098.21 6
4 2 51 passed 232.00 21
5 2 75 failed 123.87 32
我有两个数据帧 df
和 tf
如下所示
df = [{"unique_key": 1, "test_ids": "1.0,15,2.0,nan"}, {"unique_key": 2, "test_ids": "51,75.0,11.0,NaN"},{"unique_key": 3, "test_ids":np.nan},
{"unique_key": 4, "test_ids":np.nan}]
df = pd.DataFrame(df)
test_ids,status,revenue,cnt_days
1,passed,234.54,3
2,passed,543.21,5
11,failed,21.3,4
15,failed,2098.21,6
51,passed,232,21
75,failed,123.87,32
tf = pd.read_clipboard(sep=',')
我想 link 从 df
到 tf
数据框的 unique_key
列
例如:我将在下面显示我的输出(比文本更容易理解)
我正在尝试类似下面的操作
for b in df.test_ids.tolist():
for a in b.split(','):
if a >= 0: # to exclude NA values from checking
for i in len(test_ids):
if int(a) == tf['test_ids'][i]:
tf['unique_key'] = df['unique_key']
但这对于解决我的问题既不高效也不优雅。
是否有其他更好的方法来实现如下所示的预期输出?
您可以创建 Series
并删除重复项和缺失值,交换到 dictioanry 并为新的第一列使用 DataFrame.insert
with Series.map
:
s = (df.set_index('unique_key')['test_ids']
.str.split(',')
.explode()
.astype(float)
.dropna()
.astype(int)
.drop_duplicates()
d = {v: k for k, v in s.items()}
print (d)
{1: 1, 15: 1, 2: 1, 51: 2, 75: 2, 11: 2}
tf.insert(0, 'unique_key', tf['test_ids'].map(d))
print (tf)
unique_key test_ids status revenue cnt_days
0 1 1 passed 234.54 3
1 1 2 passed 543.21 5
2 2 11 failed 21.30 4
3 1 15 failed 2098.21 6
4 2 51 passed 232.00 21
5 2 75 failed 123.87 32
另一个想法是使用 DataFrame
并创建 Series
用于映射:
s = (df.assign(new = df['test_ids'].str.split(','))
.explode('new')
.astype({'new':float})
.dropna(subset=['new'])
.astype({'new':int})
.drop_duplicates(subset=['new'])
.set_index('new')['unique_key'])
print (s)
new
1 1
15 1
2 1
51 2
75 2
11 2
Name: unique_key, dtype: int64
tf.insert(0, 'unique_key', tf['test_ids'].map(s))
print (tf)
unique_key test_ids status revenue cnt_days
0 1 1 passed 234.54 3
1 1 2 passed 543.21 5
2 2 11 failed 21.30 4
3 1 15 failed 2098.21 6
4 2 51 passed 232.00 21
5 2 75 failed 123.87 32