Isin across 2 columns for groupby
Isin across 2 columns for groupby
当我知道要在 df1 中匹配的数据将分布在 2 列(标题、ID)时,如何将 isin 与 or (?) 一起使用。
如果删除 ' 或 df1[df1.ID.isin(df2[column])] '
,则以下代码有效
import pandas as pd
df1 = pd.DataFrame({'Title': ['A1', 'A2', 'A3', 'C1', 'C2', 'C3'],
'ID': ['B1', 'B2', 'B3', 'D1', 'D2', 'D3'],
'Whole': ['full', 'full', 'full', 'semi', 'semi', 'semi']})
df2 = pd.DataFrame({'Group1': ['A1', 'A2', 'A3'],
'Group2': ['B1', 'B2', 'B3']})
df = pd.DataFrame()
for column in df2.columns:
d_group = (df1[df1.Title.isin(df2[column])] or df1[df1.ID.isin(df2[column])])
df3 = d_group.groupby('Whole')['Whole'].count()\
.rename(column, inplace=True)\
.reindex(['part', 'full', 'semi'], fill_value='-')
df = df.append(df3, ignore_index=False, sort=False)
print(df)
期望的输出:
| full | part | semi
--------+---------+----------+----------
Group1 | 3 | - | -
Group2 | 3 | - | -
您需要使用 |
而不是 or
并确保从您想要的 df 中正确使用 []
到 sub-select。一般来说,符号是 df[selection_filter]
import pandas as pd
df1 = pd.DataFrame({'Title': ['A1', 'A2', 'A3', 'C1', 'C2', 'C3'],
'ID': ['B1', 'B2', 'B3', 'D1', 'D2', 'D3'],
'Whole': ['full', 'full', 'full', 'semi', 'semi', 'semi']})
df2 = pd.DataFrame({'Group1': ['A1', 'A2', 'A3'],
'Group2': ['B1', 'B2', 'B3']})
df = pd.DataFrame()
for column in df2.columns:
d_group = df1[df1.Title.isin(df2[column]) | df1.ID.isin(df2[column])]
df3 = d_group.groupby('Whole')['Whole'].count()\
.rename(column, inplace=True)\
.reindex(['part', 'full', 'semi'], fill_value='-')
df = df.append(df3, ignore_index=False, sort=False)
print(df)
当我知道要在 df1 中匹配的数据将分布在 2 列(标题、ID)时,如何将 isin 与 or (?) 一起使用。
如果删除 ' 或 df1[df1.ID.isin(df2[column])] '
,则以下代码有效
import pandas as pd
df1 = pd.DataFrame({'Title': ['A1', 'A2', 'A3', 'C1', 'C2', 'C3'],
'ID': ['B1', 'B2', 'B3', 'D1', 'D2', 'D3'],
'Whole': ['full', 'full', 'full', 'semi', 'semi', 'semi']})
df2 = pd.DataFrame({'Group1': ['A1', 'A2', 'A3'],
'Group2': ['B1', 'B2', 'B3']})
df = pd.DataFrame()
for column in df2.columns:
d_group = (df1[df1.Title.isin(df2[column])] or df1[df1.ID.isin(df2[column])])
df3 = d_group.groupby('Whole')['Whole'].count()\
.rename(column, inplace=True)\
.reindex(['part', 'full', 'semi'], fill_value='-')
df = df.append(df3, ignore_index=False, sort=False)
print(df)
期望的输出:
| full | part | semi
--------+---------+----------+----------
Group1 | 3 | - | -
Group2 | 3 | - | -
您需要使用 |
而不是 or
并确保从您想要的 df 中正确使用 []
到 sub-select。一般来说,符号是 df[selection_filter]
import pandas as pd
df1 = pd.DataFrame({'Title': ['A1', 'A2', 'A3', 'C1', 'C2', 'C3'],
'ID': ['B1', 'B2', 'B3', 'D1', 'D2', 'D3'],
'Whole': ['full', 'full', 'full', 'semi', 'semi', 'semi']})
df2 = pd.DataFrame({'Group1': ['A1', 'A2', 'A3'],
'Group2': ['B1', 'B2', 'B3']})
df = pd.DataFrame()
for column in df2.columns:
d_group = df1[df1.Title.isin(df2[column]) | df1.ID.isin(df2[column])]
df3 = d_group.groupby('Whole')['Whole'].count()\
.rename(column, inplace=True)\
.reindex(['part', 'full', 'semi'], fill_value='-')
df = df.append(df3, ignore_index=False, sort=False)
print(df)