多个数据框
multiple data frames
我有多个数据框。假设考虑我有三个数据框:-
df1 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['abc', 'bca', 'cab'])
df2 = pd.DataFrame(np.array([[11, 12, 13], [4, 45, 46], [7, 48, 49]]),
columns=['abc', 'acb', 'bac'])
df3 = pd.DataFrame(np.array([[1, 2, 3], [4, 55, 96], [7, 88, 79]]),
columns=['abc', 'ca', 'gac'])
现在我想根据第 'abc'
列连接三个数据框,其中前两个数据框的连接条件为 'outer'
,这两个数据框的结果必须与 [=18] 连接=] 与连接条件 'inner
'。为此,我可以使用 lambda reduce 以一个条件连接三个数据帧,但是当有两个连接条件时如何连接?
代码:-
when join condition is 'outer'
data_frames = [df1, df2,df3]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['abc'],
how='outer'), data_frames)
结果:-
abc bca cab acb bac ca gac
0 1 2.0 3.0 NaN NaN 2.0 3.0
1 4 5.0 6.0 45.0 46.0 55.0 96.0
2 7 8.0 9.0 48.0 49.0 88.0 79.0
3 11 NaN NaN 12.0 13.0 NaN NaN
但是当给出两个条件时:-
data_frames = [df1, df2,df3]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['abc'],
how='outer','inner'), data_frames)
错误:-
how='outer','inner'), data_frames)
^
SyntaxError: positional argument follows keyword argument
我知道我们不能给出两个条件,但是给出两个条件的最佳方式是什么。(注意:- 这仅适用于三个数据帧,但我正在寻找多个数据帧)
预期输出:-
abc bca cab acb bac ca gac
0 1 2.0 3.0 NaN NaN 2 3
1 4 5.0 6.0 45.0 46.0 55 96
2 7 8.0 9.0 48.0 49.0 88 79
一种方法是使用您的连接类型创建一个迭代器并使用 next
获取每个方法。
import pandas as pd
import numpy as np
from functools import reduce
df1 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['abc', 'bca', 'cab'])
df2 = pd.DataFrame(np.array([[11, 12, 13], [4, 45, 46], [7, 48, 49]]),
columns=['abc', 'acb', 'bac'])
df3 = pd.DataFrame(np.array([[1, 2, 3], [4, 55, 96], [7, 88, 79]]),
columns=['abc', 'ca', 'gac'])
data_frames = [df1, df2, df3]
merge_hows = iter(['outer', 'inner'])
df_merged = reduce(lambda left, right: pd.merge(left, right, on=['abc'],
how=next(merge_hows)), data_frames)
print(df_merged)
输出:
abc bca cab acb bac ca gac
0 1 2.0 3.0 NaN NaN 2 3
1 4 5.0 6.0 45.0 46.0 55 96
2 7 8.0 9.0 48.0 49.0 88 79
我有多个数据框。假设考虑我有三个数据框:-
df1 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['abc', 'bca', 'cab'])
df2 = pd.DataFrame(np.array([[11, 12, 13], [4, 45, 46], [7, 48, 49]]),
columns=['abc', 'acb', 'bac'])
df3 = pd.DataFrame(np.array([[1, 2, 3], [4, 55, 96], [7, 88, 79]]),
columns=['abc', 'ca', 'gac'])
现在我想根据第 'abc'
列连接三个数据框,其中前两个数据框的连接条件为 'outer'
,这两个数据框的结果必须与 [=18] 连接=] 与连接条件 'inner
'。为此,我可以使用 lambda reduce 以一个条件连接三个数据帧,但是当有两个连接条件时如何连接?
代码:-
when join condition is 'outer'
data_frames = [df1, df2,df3]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['abc'],
how='outer'), data_frames)
结果:-
abc bca cab acb bac ca gac
0 1 2.0 3.0 NaN NaN 2.0 3.0
1 4 5.0 6.0 45.0 46.0 55.0 96.0
2 7 8.0 9.0 48.0 49.0 88.0 79.0
3 11 NaN NaN 12.0 13.0 NaN NaN
但是当给出两个条件时:-
data_frames = [df1, df2,df3]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['abc'],
how='outer','inner'), data_frames)
错误:-
how='outer','inner'), data_frames)
^
SyntaxError: positional argument follows keyword argument
我知道我们不能给出两个条件,但是给出两个条件的最佳方式是什么。(注意:- 这仅适用于三个数据帧,但我正在寻找多个数据帧)
预期输出:-
abc bca cab acb bac ca gac
0 1 2.0 3.0 NaN NaN 2 3
1 4 5.0 6.0 45.0 46.0 55 96
2 7 8.0 9.0 48.0 49.0 88 79
一种方法是使用您的连接类型创建一个迭代器并使用 next
获取每个方法。
import pandas as pd
import numpy as np
from functools import reduce
df1 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['abc', 'bca', 'cab'])
df2 = pd.DataFrame(np.array([[11, 12, 13], [4, 45, 46], [7, 48, 49]]),
columns=['abc', 'acb', 'bac'])
df3 = pd.DataFrame(np.array([[1, 2, 3], [4, 55, 96], [7, 88, 79]]),
columns=['abc', 'ca', 'gac'])
data_frames = [df1, df2, df3]
merge_hows = iter(['outer', 'inner'])
df_merged = reduce(lambda left, right: pd.merge(left, right, on=['abc'],
how=next(merge_hows)), data_frames)
print(df_merged)
输出:
abc bca cab acb bac ca gac
0 1 2.0 3.0 NaN NaN 2 3
1 4 5.0 6.0 45.0 46.0 55 96
2 7 8.0 9.0 48.0 49.0 88 79