Pandas - 查找两个数据帧中值的交集,return 具有相同大小和交集数量的单个数据帧
Pandas - Finding the intersection of values in two dataframes, return a single dataframe of same size with number of intersections
我正在寻找一种方法来实现 return (n x n) 数据帧,其中数据帧的每个值都是两个数据帧(两者的大小均为 n x n)的值之间的交集数。
我不太确定如何在两个数据帧之间执行这样的操作。非常感谢任何帮助。
谢谢!
df1
0 1
0 [4, 7, 3, 5] [4, 7, 3, 5]
1 [8, 2, 6] [8, 2, 6]
2 [9, 1, 8, 2] [9, 1, 8, 2]
3 [3, 5] [3, 5]
4 [9, 4, 8] [9, 4, 8]
5 [0, 1, 4] [0, 1, 4]
df2
0 1
0 [2, 3, 6, 9] [6, 2, 3, 5]
1 [2, 3, 6, 9] [6, 2, 3, 5]
2 [2, 3, 6, 9] [6, 2, 3, 5]
3 [2, 3, 6, 9] [6, 2, 3, 5]
4 [2, 3, 6, 9] [6, 2, 3, 5]
5 [2, 3, 6, 9] [6, 2, 3, 5]
df3 - intended dataframe to be returned
0 1
0 1 2
1 1 2
2 2 1
3 1 2
4 0 0
5 0 0
编辑:修复了示例结果中的错误
嗯,我没办法直接用 pandas 做这个,我只有听写的解决方案。而且我认为您的示例结果有误,我认为我的结果是预期结果。
import pandas as pd
# Prework to get your data
data = {0: [[4, 7, 3, 5], [8, 2, 6], [9, 1, 8, 2], [3, 5], [9, 4, 8], [0, 1, 4]],
1: [[4, 7, 3, 5], [8, 2, 6], [9, 1, 8, 2], [3, 5], [9, 4, 8], [0, 1, 4]]}
data2 = {0: [[2, 3, 6, 9], [2, 3, 6, 9], [2, 3, 6, 9], [2, 3, 6, 9], [2, 3, 6, 9], [2, 3, 6, 9]],
1: [[6, 2, 3, 5], [6, 2, 3, 5], [6, 2, 3, 5], [6, 2, 3, 5], [6, 2, 3, 5], [6, 2, 3, 5]]}
df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
# ---
dc = df.to_dict()
dc2 = df2.to_dict()
new_dc = dc.copy()
for key in dc:
for val in dc[key]:
new_dc[key][val] = len(set(dc[key][val]).intersection(dc2[key][val]))
new_df = pd.DataFrame(new_dc)
print(new_df)
输出:
0 1
0 1 2
1 2 2
2 2 1
3 1 2
4 1 0
5 0 0
我正在寻找一种方法来实现 return (n x n) 数据帧,其中数据帧的每个值都是两个数据帧(两者的大小均为 n x n)的值之间的交集数。
我不太确定如何在两个数据帧之间执行这样的操作。非常感谢任何帮助。
谢谢!
df1
0 1
0 [4, 7, 3, 5] [4, 7, 3, 5]
1 [8, 2, 6] [8, 2, 6]
2 [9, 1, 8, 2] [9, 1, 8, 2]
3 [3, 5] [3, 5]
4 [9, 4, 8] [9, 4, 8]
5 [0, 1, 4] [0, 1, 4]
df2
0 1
0 [2, 3, 6, 9] [6, 2, 3, 5]
1 [2, 3, 6, 9] [6, 2, 3, 5]
2 [2, 3, 6, 9] [6, 2, 3, 5]
3 [2, 3, 6, 9] [6, 2, 3, 5]
4 [2, 3, 6, 9] [6, 2, 3, 5]
5 [2, 3, 6, 9] [6, 2, 3, 5]
df3 - intended dataframe to be returned
0 1
0 1 2
1 1 2
2 2 1
3 1 2
4 0 0
5 0 0
编辑:修复了示例结果中的错误
嗯,我没办法直接用 pandas 做这个,我只有听写的解决方案。而且我认为您的示例结果有误,我认为我的结果是预期结果。
import pandas as pd
# Prework to get your data
data = {0: [[4, 7, 3, 5], [8, 2, 6], [9, 1, 8, 2], [3, 5], [9, 4, 8], [0, 1, 4]],
1: [[4, 7, 3, 5], [8, 2, 6], [9, 1, 8, 2], [3, 5], [9, 4, 8], [0, 1, 4]]}
data2 = {0: [[2, 3, 6, 9], [2, 3, 6, 9], [2, 3, 6, 9], [2, 3, 6, 9], [2, 3, 6, 9], [2, 3, 6, 9]],
1: [[6, 2, 3, 5], [6, 2, 3, 5], [6, 2, 3, 5], [6, 2, 3, 5], [6, 2, 3, 5], [6, 2, 3, 5]]}
df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
# ---
dc = df.to_dict()
dc2 = df2.to_dict()
new_dc = dc.copy()
for key in dc:
for val in dc[key]:
new_dc[key][val] = len(set(dc[key][val]).intersection(dc2[key][val]))
new_df = pd.DataFrame(new_dc)
print(new_df)
输出:
0 1
0 1 2
1 2 2
2 2 1
3 1 2
4 1 0
5 0 0