获取 Pandas 列的所有指定年份中出现的数据

Get data that occur in all specified years of column in Pandas

我只需要那些带有年份的公司名称和估算的 grp 规模,这些公司在所有三年中都存在,例如 2019、2020、2021

year    Company/Account Name    EstimatedGroupSize
0   2019    Unknown             19550
1   2019    Mayo Clinic         7754
2   2019    Deloitte            6432
3   2019    Rizona State        5582
4   2019    Intel Corporation   4595
5   2020    Deloitte            4063
6   2020    Unknown             3490
7   2021    Unknown             3484
8   2020    Intel Corporation   3460
9   2021    Intel Corporation   3433
10  2021    Deloitte            3250

所以我的输出应该是

year    Company/Account Name    EstimatedGroupSize
0   2019    Unknown             19550
2   2019    Deloitte            6432
4   2019    Intel Corporation   4595
5   2020    Deloitte            4063
6   2020    Unknown             3490
7   2021    Unknown             3484
8   2020    Intel Corporation   3460
9   2021    Intel Corporation   3433
10  2021    Deloitte            3250

这里是过滤器 year 的解决方案 Company/Account Name 如果存在至少一行并通过内部 merge:

过滤原始 DataFrame
#if need filter ony some years first
df = df[df['year'].isin([2019, 2020, 2021])]

df1 = pd.crosstab(df['year'], df['Company/Account Name'])

df = df.merge(df1.loc[:, df1.gt(0).all()].stack().index.to_frame(index=False))
print (df)
   year Company/Account Name  EstimatedGroupSize
0  2019              Unknown               19550
1  2019             Deloitte                6432
2  2019    Intel Corporation                4595
3  2020             Deloitte                4063
4  2020              Unknown                3490
5  2021              Unknown                3484
6  2020    Intel Corporation                3460
7  2021    Intel Corporation                3433
8  2021             Deloitte                3250

IIUC,

years = [2019, 2020, 2021]
new_df = \
df.loc[pd.get_dummies(df['year'])
         .groupby(df['Company/Account Name'])[years]
         .transform('sum')
         .gt(0)
         .all(axis=1)]
print(new_df)

    year Company/Account Name  EstimatedGroupSize
0   2019              Unknown               19550
2   2019             Deloitte                6432
4   2019    Intel-Corporation                4595
5   2020             Deloitte                4063
6   2020              Unknown                3490
7   2021              Unknown                3484
8   2020    Intel-Corporation                3460
9   2021    Intel-Corporation                3433
10  2021             Deloitte                3250

或者:

years = [2019, 2020, 2021]
new_df = \
df.groupby('Company/Account Name')\
  .filter(lambda x: np.isin(years, x['year']).all())