计算时间序列中的连续值
Count consecutives values in time series
这是我第一次在这里提问,所以我希望我能做对!
我有一个 Pandas 数据框:
df2.data
Out[66]:
date
2016-01-02 0.0
2016-01-03 1.0
2016-01-04 1.0
2016-01-05 1.0
2016-01-06 0.0
2016-01-07 0.0
2016-01-08 1.0
2016-01-09 2.0
2016-01-10 1.0
2016-01-11 0.0
Name: data, dtype: float64
我想要以下结果:
data trend trend_type
date
2016-01-02 0.0 0 0
2016-01-03 1.0 0 0
2016-01-04 1.0 1 1
2016-01-05 1.0 2 1
2016-01-06 0.0 0 0
2016-01-07 0.0 1 0
2016-01-08 1.0 0 0
2016-01-09 2.0 0 0
2016-01-10 1.0 0 0
2016-01-11 0.0 0 0
我的问题与How to use pandas to find consecutive same data in time series有点相关。
到目前为止,我设法掌握了趋势,但效率不够高(750 行数据帧大约需要 8 秒)
df['grp'] = (df.close.diff(1) == 0).astype('int')
df['trend'] = 0
start_time = time.time()
for i in range(2, len(df['grp'])):
if df.grp.iloc[i] == 1:
df['trend'].iloc[i] = df['trend'].iloc[i-1] + 1
步骤 1
要获得 trend
,请执行 groupby
+ cumcount
-
df['trend'] = df.data.groupby(df.data.ne(df.data.shift()).cumsum()).cumcount()
df
data trend
2016-01-02 0.0 0
2016-01-03 1.0 0
2016-01-04 1.0 1
2016-01-05 1.0 2
2016-01-06 0.0 0
2016-01-07 0.0 1
2016-01-08 1.0 0
2016-01-09 2.0 0
2016-01-10 1.0 0
2016-01-11 0.0 0
步骤 2
(IIUC),要得到 trend_type
,比较连续的行并赋值。
df['trend_type'] = 0
m = df.data.eq(df.data.shift())
df.loc[m, 'trend_type'] = df.loc[m, 'data']
df
data trend trend_type
2016-01-02 0.0 0 0.0
2016-01-03 1.0 0 0.0
2016-01-04 1.0 1 1.0
2016-01-05 1.0 2 1.0
2016-01-06 0.0 0 0.0
2016-01-07 0.0 1 0.0
2016-01-08 1.0 0 0.0
2016-01-09 2.0 0 0.0
2016-01-10 1.0 0 0.0
2016-01-11 0.0 0 0.0
编辑,添加列"trep_type"
df.loc[0, "trend"] = 0
df.loc[0, "trend_type"] = 0
for nrow in range(df.shape[0]-1):
if df.loc[nrow+1, 1] == df.loc[nrow, 1]:
df.loc[nrow+1, "trend"] = df.loc[nrow, "trend"]+1
df.loc[nrow + 1, "trend_type"] = 1
else:
df.loc[nrow + 1, "trend"] = 0
df.loc[nrow + 1, "trend_type"] = 0
这是我第一次在这里提问,所以我希望我能做对!
我有一个 Pandas 数据框:
df2.data
Out[66]:
date
2016-01-02 0.0
2016-01-03 1.0
2016-01-04 1.0
2016-01-05 1.0
2016-01-06 0.0
2016-01-07 0.0
2016-01-08 1.0
2016-01-09 2.0
2016-01-10 1.0
2016-01-11 0.0
Name: data, dtype: float64
我想要以下结果:
data trend trend_type
date
2016-01-02 0.0 0 0
2016-01-03 1.0 0 0
2016-01-04 1.0 1 1
2016-01-05 1.0 2 1
2016-01-06 0.0 0 0
2016-01-07 0.0 1 0
2016-01-08 1.0 0 0
2016-01-09 2.0 0 0
2016-01-10 1.0 0 0
2016-01-11 0.0 0 0
我的问题与How to use pandas to find consecutive same data in time series有点相关。
到目前为止,我设法掌握了趋势,但效率不够高(750 行数据帧大约需要 8 秒)
df['grp'] = (df.close.diff(1) == 0).astype('int')
df['trend'] = 0
start_time = time.time()
for i in range(2, len(df['grp'])):
if df.grp.iloc[i] == 1:
df['trend'].iloc[i] = df['trend'].iloc[i-1] + 1
步骤 1
要获得 trend
,请执行 groupby
+ cumcount
-
df['trend'] = df.data.groupby(df.data.ne(df.data.shift()).cumsum()).cumcount()
df
data trend
2016-01-02 0.0 0
2016-01-03 1.0 0
2016-01-04 1.0 1
2016-01-05 1.0 2
2016-01-06 0.0 0
2016-01-07 0.0 1
2016-01-08 1.0 0
2016-01-09 2.0 0
2016-01-10 1.0 0
2016-01-11 0.0 0
步骤 2
(IIUC),要得到 trend_type
,比较连续的行并赋值。
df['trend_type'] = 0
m = df.data.eq(df.data.shift())
df.loc[m, 'trend_type'] = df.loc[m, 'data']
df
data trend trend_type
2016-01-02 0.0 0 0.0
2016-01-03 1.0 0 0.0
2016-01-04 1.0 1 1.0
2016-01-05 1.0 2 1.0
2016-01-06 0.0 0 0.0
2016-01-07 0.0 1 0.0
2016-01-08 1.0 0 0.0
2016-01-09 2.0 0 0.0
2016-01-10 1.0 0 0.0
2016-01-11 0.0 0 0.0
编辑,添加列"trep_type"
df.loc[0, "trend"] = 0
df.loc[0, "trend_type"] = 0
for nrow in range(df.shape[0]-1):
if df.loc[nrow+1, 1] == df.loc[nrow, 1]:
df.loc[nrow+1, "trend"] = df.loc[nrow, "trend"]+1
df.loc[nrow + 1, "trend_type"] = 1
else:
df.loc[nrow + 1, "trend"] = 0
df.loc[nrow + 1, "trend_type"] = 0