pandas link 列使用列表项

pandas link column using list item

我有两个数据帧 dftf 如下所示

df = [{"unique_key": 1, "test_ids": "1.0,15,2.0,nan"}, {"unique_key": 2, "test_ids": "51,75.0,11.0,NaN"},{"unique_key": 3, "test_ids":np.nan},
     {"unique_key": 4, "test_ids":np.nan}]
df = pd.DataFrame(df)

test_ids,status,revenue,cnt_days     
1,passed,234.54,3          
2,passed,543.21,5
11,failed,21.3,4
15,failed,2098.21,6             
51,passed,232,21     
75,failed,123.87,32 

tf = pd.read_clipboard(sep=',')

我想 link 从 dftf 数据框的 unique_key

例如:我将在下面显示我的输出(比文本更容易理解)

我正在尝试类似下面的操作

for b in df.test_ids.tolist():
    for a in b.split(','):
        if a >= 0: # to exclude NA values from checking
            for i in len(test_ids):
              if int(a)  == tf['test_ids'][i]:
                   tf['unique_key'] = df['unique_key']
                 

但这对于解决我的问题既不高效也不优雅。

是否有其他更好的方法来实现如下所示的预期输出?

您可以创建 Series 并删除重复项和缺失值,交换到 dictioanry 并为新的第一列使用 DataFrame.insert with Series.map:

s = (df.set_index('unique_key')['test_ids']
       .str.split(',')
       .explode()
       .astype(float)
       .dropna()
       .astype(int)
       .drop_duplicates()
d = {v: k for k, v in s.items()}
print (d)
{1: 1, 15: 1, 2: 1, 51: 2, 75: 2, 11: 2}

tf.insert(0, 'unique_key', tf['test_ids'].map(d))
print (tf)
   unique_key  test_ids  status  revenue  cnt_days
0           1         1  passed   234.54         3
1           1         2  passed   543.21         5
2           2        11  failed    21.30         4
3           1        15  failed  2098.21         6
4           2        51  passed   232.00        21
5           2        75  failed   123.87        32

另一个想法是使用 DataFrame 并创建 Series 用于映射:

s = (df.assign(new = df['test_ids'].str.split(','))
       .explode('new')
       .astype({'new':float})
       .dropna(subset=['new'])
       .astype({'new':int})
       .drop_duplicates(subset=['new'])
       .set_index('new')['unique_key'])

print (s)
new
1     1
15    1
2     1
51    2
75    2
11    2
Name: unique_key, dtype: int64

tf.insert(0, 'unique_key', tf['test_ids'].map(s))
print (tf)
   unique_key  test_ids  status  revenue  cnt_days
0           1         1  passed   234.54         3
1           1         2  passed   543.21         5
2           2        11  failed    21.30         4
3           1        15  failed  2098.21         6
4           2        51  passed   232.00        21
5           2        75  failed   123.87        32