根据长度删除索引

Question

我有一个具有以下类型的多索引面板时间序列数据，country ID 和 year:

arrays = [['country i', 'country i', 'country i', 'country j', 'country j', 'country j', 'country e'], 
[1999,2000,2001,1999,2000,2001,2000]]

tuples = list(zip(*arrays))

index = pd.MultiIndex.from_tuples(tuples, names=["country ID", "year"])

dfx = pd.Series(np.random.randn(7), index=index)

print(dfx)

country ID  year
country i   1999    0.572030
            2000    1.736893
            2001   -1.213016
country j   1999    0.167581
            2000   -1.178015
            2001   -1.470233
country e   2000    1.298953
dtype: float64

我想删除所有那些少于 2 个观察值的国家/地区 ID。如何过滤数据框，使观察值小于 2 的国家/地区 ID 不存在。在上面的示例中，应从数据集中删除 country e。

先谢谢你！

Answer 1

一种方法是：

mask = dfx.groupby(level=0).transform("count") >= 2
print(dfx[mask])

输出

country ID  year
country i   1999   -1.259176
            2000    0.123215
            1999    0.899501
country j   2000   -0.111309
            1999    2.260785
            2000   -0.460683
dtype: float64

根据长度删除索引

Dropping index based on its length

python

filter

drop