Python Pandas Dataframe 检查列表的列和 return 来自另一个 Dataframe 的 ID

Question

我有一个 pandas 数据框 df1，它有一个索引和一列列表，看起来像：

index   IDList
0   [1,3,5,7]
1   [2,4,5,8]
2   [6,8,9]
3   [1,2]

我有另一个 pandas 数据框 df2，它以 NewID 作为索引，还有一列列表，如下所示：

NewID   IDList
1       [3]
2       [4,5]
3       [1,7]
4       [2]
5       [9,3]
6       [8]
7       [6]

我需要做的是，如果 df1.IDList 中的任何项目存在于 df2.IDList 中，那么 return 相关的列表 df2.NewID。

所以 returned d1 数据框看起来像：

index   IDList      NewID
0       [1,3,5,7]   [3,1,2,3,5]
1       [2,4,5,8]   [4,2,2,6]
2       [6,8,9]     [7,6,5]
3       [1,2]       [3,4]

编辑： 请注意，在 df2 中，IDList 中的 ID 可以出现在多行中（请参阅 ID 3 来自 df1.IDList 并且 ID 3 出现在 df2 行 1 和 5)

我在考虑某种包含“any”和列表理解的 np.where 语句？但不确定如何申请 df1 中的每个 IDList，它会查看整个 df2.IDList。也许是某种 .stack()？或 .melt()？这在具有 df2 的 vlookup 的电子表格中很容易...

感谢帮助...

Answer 1

# expand and map ids from IDList to NewID
flat_ids = pd.DataFrame({
    "NewID": pd.np.repeat(df2.NewID, df2.IDList.str.len().tolist()),
    "IDList": [x for l in df2.IDList for x in l]
}).set_index("IDList").NewID

# extract ids from flat ids using loc
df1['NewID'] = df1['IDList'].map(lambda x: flat_ids.loc[x].tolist())

Python Pandas Dataframe 检查列表的列和 return 来自另一个 Dataframe 的 ID

Python Pandas Dataframe check column of lists and return ID from another Dataframe

python

numpy

list-comprehension

melt

pandas