pandas:根据列条件将行附加到相似行下的另一个数据框
pandas: append rows to another dataframe under the similar row based on column condition
我有两个数据框如下,
import pandas as pd
d1 ={'col1': ['I ate dinner','I ate dinner', 'the play was inetresting','the play was inetresting'],
'col2': ['min', 'max', 'mid','min'],
'col3': ['min', 'max', 'max','max']}
d2 ={'col1': ['I ate dinner',' the glass is shattered', 'the play was inetresting'],
'col2': ['min', 'max', 'max'],
'col3': ['max', 'min', 'mid']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
我在 df2 中创建了一个名为 'exist' 的列,并根据 df2.col1 中的句子是否存在于 df1.col1 中添加值(true,false):
common = df1.merge(df2,on=['col1'])
index_list = df2[(~df2.col1.isin(common.col1))].index.to_list()
df2['exist'] = ' '
df2.loc[index_list, 'exist'] = 'false'
df2.loc[df2["exist"] == " ",'exist'] = 'true'
我现在想做的是,如果存在列中的值 == true,我想将该行添加到 df1 中的类似行下。所以所需的输出应该是:
output:
col1 col2 col3
0 I ate dinner min min
1 I ate dinner max max
2 I ate dinner min max
3 the play was inetresting mid max
4 the play was inetresting min max
5 the play was inetresting max mid
我想我必须使用 np.where,但我不确定如何制定追加以获得所需的输出
第一个想法是通过 df1.col1
过滤 df2
值并通过 concat
and then sorting by DataFrame.sort_values
附加到 df1
:
df = pd.concat([df1, df2[(df2.col1.isin(df1.col1))]]).sort_values('col1', ignore_index=True)
print (df)
col1 col2 col3
0 I ate dinner min min
1 I ate dinner max max
2 I ate dinner min max
3 the play was inetresting mid max
4 the play was inetresting min max
5 the play was inetresting max mid
如果只需要两个 DataFrame 中的公共值,则可以按 numpy.intersect1d
:
过滤
common = np.intersect1d(df1['col1'], df2['col1'])
df = (pd.concat([df1[df1.col1.isin(common)],
df2[df2.col1.isin(common)]])
.sort_values('col1', ignore_index=True))
print (df)
IIUC,您想添加匹配的行而不一定要依赖排序。
df2b = df2.set_index('col1')
(df1
.groupby('col1', as_index=False, group_keys=False)
.apply(lambda d: pd.concat([d, df2b.loc[[d.name]].reset_index()]))
.reset_index(drop=True)
)
输出:
col1 col2 col3
0 I ate dinner min min
1 I ate dinner max max
2 I ate dinner min max
3 the play was inetresting mid max
4 the play was inetresting min max
5 the play was inetresting max mid
我有两个数据框如下,
import pandas as pd
d1 ={'col1': ['I ate dinner','I ate dinner', 'the play was inetresting','the play was inetresting'],
'col2': ['min', 'max', 'mid','min'],
'col3': ['min', 'max', 'max','max']}
d2 ={'col1': ['I ate dinner',' the glass is shattered', 'the play was inetresting'],
'col2': ['min', 'max', 'max'],
'col3': ['max', 'min', 'mid']}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
我在 df2 中创建了一个名为 'exist' 的列,并根据 df2.col1 中的句子是否存在于 df1.col1 中添加值(true,false):
common = df1.merge(df2,on=['col1'])
index_list = df2[(~df2.col1.isin(common.col1))].index.to_list()
df2['exist'] = ' '
df2.loc[index_list, 'exist'] = 'false'
df2.loc[df2["exist"] == " ",'exist'] = 'true'
我现在想做的是,如果存在列中的值 == true,我想将该行添加到 df1 中的类似行下。所以所需的输出应该是:
output:
col1 col2 col3
0 I ate dinner min min
1 I ate dinner max max
2 I ate dinner min max
3 the play was inetresting mid max
4 the play was inetresting min max
5 the play was inetresting max mid
我想我必须使用 np.where,但我不确定如何制定追加以获得所需的输出
第一个想法是通过 df1.col1
过滤 df2
值并通过 concat
and then sorting by DataFrame.sort_values
附加到 df1
:
df = pd.concat([df1, df2[(df2.col1.isin(df1.col1))]]).sort_values('col1', ignore_index=True)
print (df)
col1 col2 col3
0 I ate dinner min min
1 I ate dinner max max
2 I ate dinner min max
3 the play was inetresting mid max
4 the play was inetresting min max
5 the play was inetresting max mid
如果只需要两个 DataFrame 中的公共值,则可以按 numpy.intersect1d
:
common = np.intersect1d(df1['col1'], df2['col1'])
df = (pd.concat([df1[df1.col1.isin(common)],
df2[df2.col1.isin(common)]])
.sort_values('col1', ignore_index=True))
print (df)
IIUC,您想添加匹配的行而不一定要依赖排序。
df2b = df2.set_index('col1')
(df1
.groupby('col1', as_index=False, group_keys=False)
.apply(lambda d: pd.concat([d, df2b.loc[[d.name]].reset_index()]))
.reset_index(drop=True)
)
输出:
col1 col2 col3
0 I ate dinner min min
1 I ate dinner max max
2 I ate dinner min max
3 the play was inetresting mid max
4 the play was inetresting min max
5 the play was inetresting max mid