如何解决 'Reindexing only valid with uniquely valued Index objects' 错误

how to solve 'Reindexing only valid with uniquely valued Index objects' error

我有一个数据框,看起来像这样:

           date        holiday  item_cnt_day    shop_id      cnt_sem    cnt_mes     cnt_year
0        2013-01-01       1         0.0           59         0.000000   0.000000    0.000000
1        2013-01-02       1         0.0           59         0.000000   0.000000    0.000000
2        2013-01-03       1         0.0           59         0.000000   0.000000    0.000000
3        2013-01-04       1         0.0           59         0.000000   0.000000    0.000000
4        2013-01-05       0         0.0           59         0.000000   0.000000    0.000000
          ......         ...        ...           ...           ...        ...         ...
1029    2015-10-27        0         4.0           36         1.142857   0.321429    0.024658
1030    2015-10-28        0         1.0           36         1.285714   0.357143    0.027397
1031    2015-10-29        0         1.0           36         1.142857   0.392857    0.030137
1032    2015-10-30        0         4.0           36         1.714286   0.535714    0.041096
1033    2015-10-31        0         1.0           36         1.857143   0.571429    0.043836

日期从 2013-01-01 到 2015-10-31 并且这个日期范围适用于每个 shop_id,也就是说,对于每个 shop_id 我都有这个日期范围,因此,我有重复的日期,但我想做的是只获得每个 shop_id 前 365 天之后的那些日期,我正试图通过使用此函数来做到这一点:

def no_todos(df, shops):
    # shops is a list of shops and there are 60 shops in this list
    # df is the dataframe to be operated in the loop

    new_df = pd.DataFrame(df)

    # Here I'm trying to only keep those observations which come after the first 365 days for each shop
    for t in shops:
        new_df['shop_id'][t] = df[365::]
    return new_df

但是,我遇到了这个错误:重新索引只对具有唯一值的索引对象有效。有谁知道如何解决这个问题?提前致谢。

首先对dataframe进行排序,然后做一个groupby,然后做negative tail。负尾未在 groupby tail 方法中实现,因此您需要创建自己的函数。这将跳过每组的第一行

df.sort_values(['shop_id', 'date'], ascending=[True, True])

def negative_tail(group, n):
    return group._selected_obj[group.cumcount(ascending=True) >= n]

final_result = negative_tail(df.groupby('shop_id'), 365).copy()