如何解决 'Reindexing only valid with uniquely valued Index objects' 错误
how to solve 'Reindexing only valid with uniquely valued Index objects' error
我有一个数据框,看起来像这样:
date holiday item_cnt_day shop_id cnt_sem cnt_mes cnt_year
0 2013-01-01 1 0.0 59 0.000000 0.000000 0.000000
1 2013-01-02 1 0.0 59 0.000000 0.000000 0.000000
2 2013-01-03 1 0.0 59 0.000000 0.000000 0.000000
3 2013-01-04 1 0.0 59 0.000000 0.000000 0.000000
4 2013-01-05 0 0.0 59 0.000000 0.000000 0.000000
...... ... ... ... ... ... ...
1029 2015-10-27 0 4.0 36 1.142857 0.321429 0.024658
1030 2015-10-28 0 1.0 36 1.285714 0.357143 0.027397
1031 2015-10-29 0 1.0 36 1.142857 0.392857 0.030137
1032 2015-10-30 0 4.0 36 1.714286 0.535714 0.041096
1033 2015-10-31 0 1.0 36 1.857143 0.571429 0.043836
日期从 2013-01-01 到 2015-10-31 并且这个日期范围适用于每个 shop_id,也就是说,对于每个 shop_id 我都有这个日期范围,因此,我有重复的日期,但我想做的是只获得每个 shop_id 前 365 天之后的那些日期,我正试图通过使用此函数来做到这一点:
def no_todos(df, shops):
# shops is a list of shops and there are 60 shops in this list
# df is the dataframe to be operated in the loop
new_df = pd.DataFrame(df)
# Here I'm trying to only keep those observations which come after the first 365 days for each shop
for t in shops:
new_df['shop_id'][t] = df[365::]
return new_df
但是,我遇到了这个错误:重新索引只对具有唯一值的索引对象有效。有谁知道如何解决这个问题?提前致谢。
首先对dataframe进行排序,然后做一个groupby,然后做negative tail。负尾未在 groupby tail
方法中实现,因此您需要创建自己的函数。这将跳过每组的第一行
df.sort_values(['shop_id', 'date'], ascending=[True, True])
def negative_tail(group, n):
return group._selected_obj[group.cumcount(ascending=True) >= n]
final_result = negative_tail(df.groupby('shop_id'), 365).copy()
我有一个数据框,看起来像这样:
date holiday item_cnt_day shop_id cnt_sem cnt_mes cnt_year
0 2013-01-01 1 0.0 59 0.000000 0.000000 0.000000
1 2013-01-02 1 0.0 59 0.000000 0.000000 0.000000
2 2013-01-03 1 0.0 59 0.000000 0.000000 0.000000
3 2013-01-04 1 0.0 59 0.000000 0.000000 0.000000
4 2013-01-05 0 0.0 59 0.000000 0.000000 0.000000
...... ... ... ... ... ... ...
1029 2015-10-27 0 4.0 36 1.142857 0.321429 0.024658
1030 2015-10-28 0 1.0 36 1.285714 0.357143 0.027397
1031 2015-10-29 0 1.0 36 1.142857 0.392857 0.030137
1032 2015-10-30 0 4.0 36 1.714286 0.535714 0.041096
1033 2015-10-31 0 1.0 36 1.857143 0.571429 0.043836
日期从 2013-01-01 到 2015-10-31 并且这个日期范围适用于每个 shop_id,也就是说,对于每个 shop_id 我都有这个日期范围,因此,我有重复的日期,但我想做的是只获得每个 shop_id 前 365 天之后的那些日期,我正试图通过使用此函数来做到这一点:
def no_todos(df, shops):
# shops is a list of shops and there are 60 shops in this list
# df is the dataframe to be operated in the loop
new_df = pd.DataFrame(df)
# Here I'm trying to only keep those observations which come after the first 365 days for each shop
for t in shops:
new_df['shop_id'][t] = df[365::]
return new_df
但是,我遇到了这个错误:重新索引只对具有唯一值的索引对象有效。有谁知道如何解决这个问题?提前致谢。
首先对dataframe进行排序,然后做一个groupby,然后做negative tail。负尾未在 groupby tail
方法中实现,因此您需要创建自己的函数。这将跳过每组的第一行
df.sort_values(['shop_id', 'date'], ascending=[True, True])
def negative_tail(group, n):
return group._selected_obj[group.cumcount(ascending=True) >= n]
final_result = negative_tail(df.groupby('shop_id'), 365).copy()