移动 pandas 数据帧的 datetimeindex 中的值
Shifting values in datetimeindex of pandas dataframe
我有一个 df,在很长一段时间(> 1 年)内,DateTimeIndex 的间隔为 30 分钟,因此 >17520 行。由于与夏令时相关的原因,索引中有两个索引值重复,还有两个值缺失。所以重复的值是:
In[1]: df[df.index.duplicated('first')]
Out[2]:
a b c
timestamp
2012-10-07 01:00:00 NaN NaN NaN
2012-10-07 01:30:00 NaN NaN NaN
2013-10-06 01:00:00 NaN NaN NaN
2013-10-06 01:30:00 NaN NaN NaN
我想将这些更改为缺失值,1 小时后:
In[3]: df[df.index.duplicated('first')].shift(1,freq="H")
Out[4]:
a b c
timestamp
2012-10-07 02:00:00 NaN NaN NaN
2012-10-07 02:30:00 NaN NaN NaN
2013-10-06 02:00:00 NaN NaN NaN
2013-10-06 02:30:00 NaN NaN NaN
但这不会改变索引:
df[df.index.duplicated('first')] = df[df.index.duplicated('first')].shift(1,freq="H")
会怎样?
我想你需要地图 duplicated index
和 rename
by dict
:
print (df)
a b c
timestamp
2013-10-06 01:00:00 1 NaN NaN
2013-10-06 01:30:00 2 NaN NaN
2013-10-06 01:00:00 3 NaN NaN
2013-10-06 01:30:00 4 NaN NaN
2012-10-08 01:30:00 5 NaN NaN
2013-10-10 01:00:00 6 NaN NaN
df1 = df[df.index.duplicated('first')]
d = dict(zip(df1.index, df1.shift(1,freq="H").index))
print (d)
{Timestamp('2013-10-06 01:00:00'): Timestamp('2013-10-06 02:00:00'),
Timestamp('2013-10-06 01:30:00'): Timestamp('2013-10-06 02:30:00')}
df = df.rename(index=d)
print (df)
a b c
timestamp
2013-10-06 02:00:00 1 NaN NaN
2013-10-06 02:30:00 2 NaN NaN
2013-10-06 02:00:00 3 NaN NaN
2013-10-06 02:30:00 4 NaN NaN
2012-10-08 01:30:00 5 NaN NaN
2013-10-10 01:00:00 6 NaN NaN
类似的解决方案:
idx = df.index[df.index.duplicated('first')]
d = dict(zip(idx, idx.to_series().shift(freq="H").index))
print (d)
{Timestamp('2013-10-06 01:00:00'): Timestamp('2013-10-06 02:00:00'),
Timestamp('2013-10-06 01:30:00'): Timestamp('2013-10-06 02:30:00')}
df = df.rename(index=d)
print (df)
a b c
timestamp
2013-10-06 02:00:00 1 NaN NaN
2013-10-06 02:30:00 2 NaN NaN
2013-10-06 02:00:00 3 NaN NaN
2013-10-06 02:30:00 4 NaN NaN
2012-10-08 01:30:00 5 NaN NaN
2013-10-10 01:00:00 6 NaN NaN
2013-10-06 02:30:00 8 NaN NaN
2012-10-08 01:30:00 9 NaN NaN
2013-10-10 01:00:00 10 NaN NaN
idx = df.index[df.index.duplicated('first')]
s = idx.to_series().shift(freq="H")
#swap index with values in Series
d = pd.Series(s.index.values, index = s.values).to_dict()
print (d)
{Timestamp('2013-10-06 01:00:00'): Timestamp('2013-10-06 02:00:00'),
Timestamp('2013-10-06 01:30:00'): Timestamp('2013-10-06 02:30:00')}
df = df.rename(index=d)
print (df)
a b c
timestamp
2013-10-06 02:00:00 1 NaN NaN
2013-10-06 02:30:00 2 NaN NaN
2013-10-06 02:00:00 3 NaN NaN
2013-10-06 02:30:00 4 NaN NaN
2012-10-08 01:30:00 5 NaN NaN
2013-10-10 01:00:00 6 NaN NaN
编辑 1:
您需要将 cumcount
with to_timedelta
创建的 timedeltas
添加到原始索引。
delta = pd.to_timedelta(df.groupby(level=0).cumcount(), unit='H')
print (delta)
timestamp
2013-10-06 01:00:00 00:00:00
2013-10-06 01:30:00 00:00:00
2013-10-06 01:00:00 01:00:00
2013-10-06 01:30:00 01:00:00
2012-10-08 01:30:00 00:00:00
2013-10-10 01:00:00 00:00:00
dtype: timedelta64[ns]
df.index = df.index + delta
print (df)
a b c
2013-10-06 01:00:00 1 NaN NaN
2013-10-06 01:30:00 2 NaN NaN
2013-10-06 02:00:00 3 NaN NaN
2013-10-06 02:30:00 4 NaN NaN
2012-10-08 01:30:00 5 NaN NaN
2013-10-10 01:00:00 6 NaN NaN
我有一个 df,在很长一段时间(> 1 年)内,DateTimeIndex 的间隔为 30 分钟,因此 >17520 行。由于与夏令时相关的原因,索引中有两个索引值重复,还有两个值缺失。所以重复的值是:
In[1]: df[df.index.duplicated('first')]
Out[2]:
a b c
timestamp
2012-10-07 01:00:00 NaN NaN NaN
2012-10-07 01:30:00 NaN NaN NaN
2013-10-06 01:00:00 NaN NaN NaN
2013-10-06 01:30:00 NaN NaN NaN
我想将这些更改为缺失值,1 小时后:
In[3]: df[df.index.duplicated('first')].shift(1,freq="H")
Out[4]:
a b c
timestamp
2012-10-07 02:00:00 NaN NaN NaN
2012-10-07 02:30:00 NaN NaN NaN
2013-10-06 02:00:00 NaN NaN NaN
2013-10-06 02:30:00 NaN NaN NaN
但这不会改变索引:
df[df.index.duplicated('first')] = df[df.index.duplicated('first')].shift(1,freq="H")
会怎样?
我想你需要地图 duplicated index
和 rename
by dict
:
print (df)
a b c
timestamp
2013-10-06 01:00:00 1 NaN NaN
2013-10-06 01:30:00 2 NaN NaN
2013-10-06 01:00:00 3 NaN NaN
2013-10-06 01:30:00 4 NaN NaN
2012-10-08 01:30:00 5 NaN NaN
2013-10-10 01:00:00 6 NaN NaN
df1 = df[df.index.duplicated('first')]
d = dict(zip(df1.index, df1.shift(1,freq="H").index))
print (d)
{Timestamp('2013-10-06 01:00:00'): Timestamp('2013-10-06 02:00:00'),
Timestamp('2013-10-06 01:30:00'): Timestamp('2013-10-06 02:30:00')}
df = df.rename(index=d)
print (df)
a b c
timestamp
2013-10-06 02:00:00 1 NaN NaN
2013-10-06 02:30:00 2 NaN NaN
2013-10-06 02:00:00 3 NaN NaN
2013-10-06 02:30:00 4 NaN NaN
2012-10-08 01:30:00 5 NaN NaN
2013-10-10 01:00:00 6 NaN NaN
类似的解决方案:
idx = df.index[df.index.duplicated('first')]
d = dict(zip(idx, idx.to_series().shift(freq="H").index))
print (d)
{Timestamp('2013-10-06 01:00:00'): Timestamp('2013-10-06 02:00:00'),
Timestamp('2013-10-06 01:30:00'): Timestamp('2013-10-06 02:30:00')}
df = df.rename(index=d)
print (df)
a b c
timestamp
2013-10-06 02:00:00 1 NaN NaN
2013-10-06 02:30:00 2 NaN NaN
2013-10-06 02:00:00 3 NaN NaN
2013-10-06 02:30:00 4 NaN NaN
2012-10-08 01:30:00 5 NaN NaN
2013-10-10 01:00:00 6 NaN NaN
2013-10-06 02:30:00 8 NaN NaN
2012-10-08 01:30:00 9 NaN NaN
2013-10-10 01:00:00 10 NaN NaN
idx = df.index[df.index.duplicated('first')]
s = idx.to_series().shift(freq="H")
#swap index with values in Series
d = pd.Series(s.index.values, index = s.values).to_dict()
print (d)
{Timestamp('2013-10-06 01:00:00'): Timestamp('2013-10-06 02:00:00'),
Timestamp('2013-10-06 01:30:00'): Timestamp('2013-10-06 02:30:00')}
df = df.rename(index=d)
print (df)
a b c
timestamp
2013-10-06 02:00:00 1 NaN NaN
2013-10-06 02:30:00 2 NaN NaN
2013-10-06 02:00:00 3 NaN NaN
2013-10-06 02:30:00 4 NaN NaN
2012-10-08 01:30:00 5 NaN NaN
2013-10-10 01:00:00 6 NaN NaN
编辑 1:
您需要将 cumcount
with to_timedelta
创建的 timedeltas
添加到原始索引。
delta = pd.to_timedelta(df.groupby(level=0).cumcount(), unit='H')
print (delta)
timestamp
2013-10-06 01:00:00 00:00:00
2013-10-06 01:30:00 00:00:00
2013-10-06 01:00:00 01:00:00
2013-10-06 01:30:00 01:00:00
2012-10-08 01:30:00 00:00:00
2013-10-10 01:00:00 00:00:00
dtype: timedelta64[ns]
df.index = df.index + delta
print (df)
a b c
2013-10-06 01:00:00 1 NaN NaN
2013-10-06 01:30:00 2 NaN NaN
2013-10-06 02:00:00 3 NaN NaN
2013-10-06 02:30:00 4 NaN NaN
2012-10-08 01:30:00 5 NaN NaN
2013-10-10 01:00:00 6 NaN NaN