使用 pandas 的前 N 行与具有前 N 个唯一值的行

Question

我有一个 pandas 数据框，如下所示

导入 pandas 作为 pd

data ={'Count':[1,1,2,3,4,2,1,1,2,1,3,1,3,6,1,1,9,3,3,6,1,5,2,2,0,2,2,4,0,1,3,2,5,0,3,3,1,2,2,1,6,2,3,4,1,1,3,3,4,3,1,1,4,2,3,0,2,2,3,1,3,6,1,8,4,5,4,2,1,4,1,1,1,2,3,4,1,1,1,3,2,0,6,2,3,2,9,10,2,1,2,3,1,2,2,3,2,1,8,4,0,3,3,5,12,1,5,13,6,13,7,3,5,2,3,3,1,1,5,15,7,9,1,1,1,2,2,2,4,3,3,2,4,1,2,9,3,1,3,0,0,4,0,1,0,1,0]}

df = pd.DataFrame(data)

我想做以下事情

a) 前 5 行（这将 return 只有 5 行）

b) 具有前 5 个唯一值的行（如果前 5 个值重复，则可以 return N > 5 行）。请参阅下面我的示例屏幕截图，其中有 8 行用于选择前 5 个唯一值

虽然我可以使用下面的方法获得前 5 行

df.nlargest(5,['Count'])

但是，当我针对 b) 尝试以下操作时，我没有得到预期的输出

df.nlargest(5,['Count'],keep='all')

我希望我的输出如下所示

Answer 1

您是在寻找前 5 个唯一值还是最大的前 5 个值？

df =(df.assign(top5rows=np.where(df.index.isin(df.head(5).index),'Y','N'),
              top5unique=np.where(df.index.isin(df.drop_duplicates(keep='first').head(5).index), 'Y','N')))

或者你需要

df =(df.assign(top5rows=np.where(df.index.isin(df.head(5).index),'Y','N'),
              top5unique=np.where(df['Count'].isin(list(df['Count'].unique()[:5])),'Y','N')))

    Count top5rows top5unique
0       1        Y          Y
1       1        Y          Y
2       2        Y          Y
3       3        Y          Y
4       4        Y          Y
5       2        N          Y
6       1        N          Y
7       1        N          Y
8       2        N          Y
9       1        N          Y
10      3        N          Y
11      1        N          Y
12      3        N          Y
13      6        N          Y
14      1        N          Y

使用 pandas 的前 N 行与具有前 N 个唯一值的行

Top N rows vs rows with Top N unique values using pandas

python

numpy

series

dataframe

pandas

使用 pandas 的前 N ​​行与具有前 N 个唯一值的行

Top N rows vs rows with Top N unique values using pandas

python

numpy

series

dataframe

pandas

使用 pandas 的前 N 行与具有前 N 个唯一值的行