多个数据框

multiple data frames

我有多个数据框。假设考虑我有三个数据框:-

df1 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['abc', 'bca', 'cab'])
df2 = pd.DataFrame(np.array([[11, 12, 13], [4, 45, 46], [7, 48, 49]]),
                   columns=['abc', 'acb', 'bac'])
df3 = pd.DataFrame(np.array([[1, 2, 3], [4, 55, 96], [7, 88, 79]]),
                   columns=['abc', 'ca', 'gac'])

现在我想根据第 'abc' 列连接三个数据框,其中前两个数据框的连接条件为 'outer',这两个数据框的结果必须与 [=18] 连接=] 与连接条件 'inner'。为此,我可以使用 lambda reduce 以一个条件连接三个数据帧,但是当有两个连接条件时如何连接?

代码:-

when join condition is 'outer'
    data_frames = [df1, df2,df3]
    df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['abc'],
                                                how='outer'), data_frames)

结果:-

abc bca cab acb bac ca  gac
0   1   2.0 3.0 NaN NaN 2.0 3.0
1   4   5.0 6.0 45.0    46.0    55.0    96.0
2   7   8.0 9.0 48.0    49.0    88.0    79.0
3   11  NaN NaN 12.0    13.0    NaN NaN

但是当给出两个条件时:-

data_frames = [df1, df2,df3]
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['abc'],
                                            how='outer','inner'), data_frames)

错误:-

 how='outer','inner'), data_frames)
               ^
SyntaxError: positional argument follows keyword argument

我知道我们不能给出两个条件,但是给出两个条件的最佳方式是什么。(注意:- 这仅适用于三个数据帧,但我正在寻找多个数据帧)

预期输出:-

abc  bca  cab   acb   bac  ca  gac
0    1  2.0  3.0   NaN   NaN   2    3
1    4  5.0  6.0  45.0  46.0  55   96
2    7  8.0  9.0  48.0  49.0  88   79

一种方法是使用您的连接类型创建一个迭代器并使用 next 获取每个方法。

import pandas as pd
import numpy as np
from functools import reduce

df1 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['abc', 'bca', 'cab'])
df2 = pd.DataFrame(np.array([[11, 12, 13], [4, 45, 46], [7, 48, 49]]),
                   columns=['abc', 'acb', 'bac'])
df3 = pd.DataFrame(np.array([[1, 2, 3], [4, 55, 96], [7, 88, 79]]),
                   columns=['abc', 'ca', 'gac'])

data_frames = [df1, df2, df3]
merge_hows = iter(['outer', 'inner'])
df_merged = reduce(lambda left, right: pd.merge(left, right, on=['abc'],
                                                how=next(merge_hows)), data_frames)

print(df_merged)

输出:

   abc  bca  cab   acb   bac  ca  gac
0    1  2.0  3.0   NaN   NaN   2    3
1    4  5.0  6.0  45.0  46.0  55   96
2    7  8.0  9.0  48.0  49.0  88   79