Pandas

Question

我正在寻找一种方法来实现 return (n x n) 数据帧，其中数据帧的每个值都是两个数据帧（两者的大小均为 n x n）的值之间的交集数。

我不太确定如何在两个数据帧之间执行这样的操作。非常感谢任何帮助。

谢谢！

df1
              0             1
0  [4, 7, 3, 5]  [4, 7, 3, 5]
1     [8, 2, 6]     [8, 2, 6]
2  [9, 1, 8, 2]  [9, 1, 8, 2]
3        [3, 5]        [3, 5]
4     [9, 4, 8]     [9, 4, 8]
5     [0, 1, 4]     [0, 1, 4]

df2
              0             1
0  [2, 3, 6, 9]  [6, 2, 3, 5]
1  [2, 3, 6, 9]  [6, 2, 3, 5]
2  [2, 3, 6, 9]  [6, 2, 3, 5]
3  [2, 3, 6, 9]  [6, 2, 3, 5]
4  [2, 3, 6, 9]  [6, 2, 3, 5]
5  [2, 3, 6, 9]  [6, 2, 3, 5]

df3 - intended dataframe to be returned
              0             1
0             1             2
1             1             2
2             2             1
3             1             2
4             0             0
5             0             0

编辑：修复了示例结果中的错误

Answer 1

嗯，我没办法直接用 pandas 做这个，我只有听写的解决方案。而且我认为您的示例结果有误，我认为我的结果是预期结果。

import pandas as pd

# Prework to get your data
data = {0: [[4, 7, 3, 5], [8, 2, 6], [9, 1, 8, 2], [3, 5], [9, 4, 8], [0, 1, 4]],
        1: [[4, 7, 3, 5], [8, 2, 6], [9, 1, 8, 2], [3, 5], [9, 4, 8], [0, 1, 4]]}

data2 = {0: [[2, 3, 6, 9], [2, 3, 6, 9], [2, 3, 6, 9], [2, 3, 6, 9], [2, 3, 6, 9], [2, 3, 6, 9]],
         1: [[6, 2, 3, 5], [6, 2, 3, 5], [6, 2, 3, 5], [6, 2, 3, 5], [6, 2, 3, 5], [6, 2, 3, 5]]}

df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)

# ---

dc = df.to_dict()
dc2 = df2.to_dict()

new_dc = dc.copy()
for key in dc:
    for val in dc[key]:
        new_dc[key][val] = len(set(dc[key][val]).intersection(dc2[key][val]))
new_df = pd.DataFrame(new_dc)

print(new_df)

输出：

Pandas - 查找两个数据帧中值的交集，return 具有相同大小和交集数量的单个数据帧

Pandas - Finding the intersection of values in two dataframes, return a single dataframe of same size with number of intersections

python

intersection

dataframe