时间序列数据操作
Data manipulation with time series
我的数据框如下所示:
ID date var1 var2
0 1100289299522 2020-12-01 109.046450 8.0125
1 1100289299522 2020-12-02 104.494946 6.1500
2 1100289299522 2020-12-03 117.011582 5.9375
3 1100289299522 2020-12-04 109.615388 5.4750
4 1100289299522 2020-12-05 142.803438 3.8500
... ... ... ... ...
960045 21380318319578 2021-05-27 7.524261 15.4875
960046 21380318319578 2021-05-28 3.256770 17.3625
960047 21380318319578 2021-05-29 0.561512 18.3250
960048 21380318319578 2021-05-30 1.347629 18.7625
960049 21380318319578 2021-05-31 0.112302 20.0750
在 pandas 中是否有一种简单的方法让每行有一个 ID 并像这样设置列:
ID 2020-12-01_var1 2020-12-02_var1 ... 2021-05-31_var1 2020-12-01_var2 2020-12-02_var2 ... 2021-05-31 _var2
1100289299522 109.046450 104.494946 ... ___ 8.0125 6.1500 ... ___
然后我可以使用降维算法(如 TSNE)并可能对每个时间序列(和 ID)进行分类。
您认为这是正确的处理方式吗?
尝试:
out = df.pivot(index='ID', columns='date', values=['var1', 'var2'])
out.columns = out.columns.to_flat_index().str.join('_')
您的示例:
>>> out
var1_2020-12-01 var1_2020-12-02 var1_2020-12-03 ... var2_2021-05-29 var2_2021-05-30 var2_2021-05-31
ID ...
1100289299522 109.04645 104.494946 117.011582 ... NaN NaN NaN
21380318319578 NaN NaN NaN ... 18.325 18.7625 20.075
[2 rows x 20 columns]
我的数据框如下所示:
ID date var1 var2
0 1100289299522 2020-12-01 109.046450 8.0125
1 1100289299522 2020-12-02 104.494946 6.1500
2 1100289299522 2020-12-03 117.011582 5.9375
3 1100289299522 2020-12-04 109.615388 5.4750
4 1100289299522 2020-12-05 142.803438 3.8500
... ... ... ... ...
960045 21380318319578 2021-05-27 7.524261 15.4875
960046 21380318319578 2021-05-28 3.256770 17.3625
960047 21380318319578 2021-05-29 0.561512 18.3250
960048 21380318319578 2021-05-30 1.347629 18.7625
960049 21380318319578 2021-05-31 0.112302 20.0750
在 pandas 中是否有一种简单的方法让每行有一个 ID 并像这样设置列:
ID 2020-12-01_var1 2020-12-02_var1 ... 2021-05-31_var1 2020-12-01_var2 2020-12-02_var2 ... 2021-05-31 _var2
1100289299522 109.046450 104.494946 ... ___ 8.0125 6.1500 ... ___
然后我可以使用降维算法(如 TSNE)并可能对每个时间序列(和 ID)进行分类。
您认为这是正确的处理方式吗?
尝试:
out = df.pivot(index='ID', columns='date', values=['var1', 'var2'])
out.columns = out.columns.to_flat_index().str.join('_')
您的示例:
>>> out
var1_2020-12-01 var1_2020-12-02 var1_2020-12-03 ... var2_2021-05-29 var2_2021-05-30 var2_2021-05-31
ID ...
1100289299522 109.04645 104.494946 117.011582 ... NaN NaN NaN
21380318319578 NaN NaN NaN ... 18.325 18.7625 20.075
[2 rows x 20 columns]